You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by 周梦想 <ab...@gmail.com> on 2012/12/20 10:48:00 UTC

why not hadoop backup name node data to local disk daily or hourly?

Some reasons lead to my name node data error, but the error data also
overwrite the second name node data, also the NFS backup. I want to recover
the name node data a day ago or even a week ago,but I can't. I have to back
up name node data manually or write a bash script to backup it? why  hadoop
does not give a configure to   backup name node data to local disk daily or
 hourly with different time stamp name?

The same question is to HBase's .META. and -ROOT- table. I think it's
history storage is more important 100  times than the log history.

I think it could be implemented in Second Name Node/Check Points Node or
Back Node. Now I do this just using bash script.

Some one agree with me?


Best Regards,
Andy Zhou

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by Robert Dyer <ps...@gmail.com>.
I actually have this exact same error.  After running my namenode for
awhile (with a snn), it gets to a point where the snn starts crashing and
if I try to restart the NN I will get this problem.  I typically wind up
having to go with a much older copy of the image and edits files in order
to get it up and running and naturally that means data loss.

On Mon, Dec 24, 2012 at 8:22 PM, 周梦想 <ab...@gmail.com> wrote:

> thanks Tariq,
> Now we are trying to recover data,but some data has lost forever.
>
> the logs just reported NULL Point Exception:
>
> 2012-12-17 17:09:05,646 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NullPointerException
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1094)
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1106)
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1009)
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:208)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:626)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1015)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:833)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:372)
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:388)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:362)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:276)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:496)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1288)
>
> We changed the source of hadoop to try catch this exception and rebuild
> it, then we can start hadoop NN, but the problem of HBase remained.
> so we have to upgrade the version of HBase and try to repair HBase Meta
> data from Regins data.
> Now we are planning to upgrade to stable version of hadoop 1.0.4 and HBase
> 0.94.3.
>
> Best regards,
> Andy
>
> 2012/12/24 Mohammad Tariq <do...@gmail.com>
>
>> Hello Andy,
>>
>>      I hope you are stable now :)
>>
>> Just a quick question. Did you find anything interesting in the NN, SNN,
>> DN logs?
>>
>> And my grandma says, I look like Abhishek Bachchcan<http://en.wikipedia.org/wiki/Abhishek_Bacchan>;)
>>
>> Best Regards,
>> Tariq
>> +91-9741563634
>> https://mtariq.jux.com/
>>
>>
>> On Mon, Dec 24, 2012 at 4:24 PM, 周梦想 <ab...@gmail.com> wrote:
>>
>>> I stoped the Hadoop, changed every nodes' IP and configured again, and
>>> started Hadoop again. Yes, we did change the IP of NN.
>>>
>>>
>>> 2012/12/24 Nitin Pawar <ni...@gmail.com>
>>>
>>>> what do you mean by this "We changed all IPs of the Hadoop System"
>>>>
>>>> You changed the IPs of the nodes in one go? or you retired nodes one by
>>>> one and changed IPs and brought them back in rotation? Also did you change
>>>> IP of your NN as well ?
>>>>
>>>>
>>>>
>>>> On Mon, Dec 24, 2012 at 4:10 PM, 周梦想 <ab...@gmail.com> wrote:
>>>>
>>>>> Actually the problem was beggining at SecondNameNode. We changed all
>>>>> IPs of the Hadoop System
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Nitin Pawar
>>>>
>>>

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by Robert Dyer <ps...@gmail.com>.
I actually have this exact same error.  After running my namenode for
awhile (with a snn), it gets to a point where the snn starts crashing and
if I try to restart the NN I will get this problem.  I typically wind up
having to go with a much older copy of the image and edits files in order
to get it up and running and naturally that means data loss.

On Mon, Dec 24, 2012 at 8:22 PM, 周梦想 <ab...@gmail.com> wrote:

> thanks Tariq,
> Now we are trying to recover data,but some data has lost forever.
>
> the logs just reported NULL Point Exception:
>
> 2012-12-17 17:09:05,646 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NullPointerException
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1094)
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1106)
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1009)
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:208)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:626)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1015)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:833)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:372)
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:388)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:362)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:276)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:496)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1288)
>
> We changed the source of hadoop to try catch this exception and rebuild
> it, then we can start hadoop NN, but the problem of HBase remained.
> so we have to upgrade the version of HBase and try to repair HBase Meta
> data from Regins data.
> Now we are planning to upgrade to stable version of hadoop 1.0.4 and HBase
> 0.94.3.
>
> Best regards,
> Andy
>
> 2012/12/24 Mohammad Tariq <do...@gmail.com>
>
>> Hello Andy,
>>
>>      I hope you are stable now :)
>>
>> Just a quick question. Did you find anything interesting in the NN, SNN,
>> DN logs?
>>
>> And my grandma says, I look like Abhishek Bachchcan<http://en.wikipedia.org/wiki/Abhishek_Bacchan>;)
>>
>> Best Regards,
>> Tariq
>> +91-9741563634
>> https://mtariq.jux.com/
>>
>>
>> On Mon, Dec 24, 2012 at 4:24 PM, 周梦想 <ab...@gmail.com> wrote:
>>
>>> I stoped the Hadoop, changed every nodes' IP and configured again, and
>>> started Hadoop again. Yes, we did change the IP of NN.
>>>
>>>
>>> 2012/12/24 Nitin Pawar <ni...@gmail.com>
>>>
>>>> what do you mean by this "We changed all IPs of the Hadoop System"
>>>>
>>>> You changed the IPs of the nodes in one go? or you retired nodes one by
>>>> one and changed IPs and brought them back in rotation? Also did you change
>>>> IP of your NN as well ?
>>>>
>>>>
>>>>
>>>> On Mon, Dec 24, 2012 at 4:10 PM, 周梦想 <ab...@gmail.com> wrote:
>>>>
>>>>> Actually the problem was beggining at SecondNameNode. We changed all
>>>>> IPs of the Hadoop System
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Nitin Pawar
>>>>
>>>

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by Robert Dyer <ps...@gmail.com>.
I actually have this exact same error.  After running my namenode for
awhile (with a snn), it gets to a point where the snn starts crashing and
if I try to restart the NN I will get this problem.  I typically wind up
having to go with a much older copy of the image and edits files in order
to get it up and running and naturally that means data loss.

On Mon, Dec 24, 2012 at 8:22 PM, 周梦想 <ab...@gmail.com> wrote:

> thanks Tariq,
> Now we are trying to recover data,but some data has lost forever.
>
> the logs just reported NULL Point Exception:
>
> 2012-12-17 17:09:05,646 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NullPointerException
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1094)
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1106)
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1009)
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:208)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:626)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1015)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:833)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:372)
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:388)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:362)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:276)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:496)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1288)
>
> We changed the source of hadoop to try catch this exception and rebuild
> it, then we can start hadoop NN, but the problem of HBase remained.
> so we have to upgrade the version of HBase and try to repair HBase Meta
> data from Regins data.
> Now we are planning to upgrade to stable version of hadoop 1.0.4 and HBase
> 0.94.3.
>
> Best regards,
> Andy
>
> 2012/12/24 Mohammad Tariq <do...@gmail.com>
>
>> Hello Andy,
>>
>>      I hope you are stable now :)
>>
>> Just a quick question. Did you find anything interesting in the NN, SNN,
>> DN logs?
>>
>> And my grandma says, I look like Abhishek Bachchcan<http://en.wikipedia.org/wiki/Abhishek_Bacchan>;)
>>
>> Best Regards,
>> Tariq
>> +91-9741563634
>> https://mtariq.jux.com/
>>
>>
>> On Mon, Dec 24, 2012 at 4:24 PM, 周梦想 <ab...@gmail.com> wrote:
>>
>>> I stoped the Hadoop, changed every nodes' IP and configured again, and
>>> started Hadoop again. Yes, we did change the IP of NN.
>>>
>>>
>>> 2012/12/24 Nitin Pawar <ni...@gmail.com>
>>>
>>>> what do you mean by this "We changed all IPs of the Hadoop System"
>>>>
>>>> You changed the IPs of the nodes in one go? or you retired nodes one by
>>>> one and changed IPs and brought them back in rotation? Also did you change
>>>> IP of your NN as well ?
>>>>
>>>>
>>>>
>>>> On Mon, Dec 24, 2012 at 4:10 PM, 周梦想 <ab...@gmail.com> wrote:
>>>>
>>>>> Actually the problem was beggining at SecondNameNode. We changed all
>>>>> IPs of the Hadoop System
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Nitin Pawar
>>>>
>>>

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by Robert Dyer <ps...@gmail.com>.
I actually have this exact same error.  After running my namenode for
awhile (with a snn), it gets to a point where the snn starts crashing and
if I try to restart the NN I will get this problem.  I typically wind up
having to go with a much older copy of the image and edits files in order
to get it up and running and naturally that means data loss.

On Mon, Dec 24, 2012 at 8:22 PM, 周梦想 <ab...@gmail.com> wrote:

> thanks Tariq,
> Now we are trying to recover data,but some data has lost forever.
>
> the logs just reported NULL Point Exception:
>
> 2012-12-17 17:09:05,646 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NullPointerException
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1094)
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1106)
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1009)
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:208)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:626)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1015)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:833)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:372)
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:388)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:362)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:276)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:496)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1288)
>
> We changed the source of hadoop to try catch this exception and rebuild
> it, then we can start hadoop NN, but the problem of HBase remained.
> so we have to upgrade the version of HBase and try to repair HBase Meta
> data from Regins data.
> Now we are planning to upgrade to stable version of hadoop 1.0.4 and HBase
> 0.94.3.
>
> Best regards,
> Andy
>
> 2012/12/24 Mohammad Tariq <do...@gmail.com>
>
>> Hello Andy,
>>
>>      I hope you are stable now :)
>>
>> Just a quick question. Did you find anything interesting in the NN, SNN,
>> DN logs?
>>
>> And my grandma says, I look like Abhishek Bachchcan<http://en.wikipedia.org/wiki/Abhishek_Bacchan>;)
>>
>> Best Regards,
>> Tariq
>> +91-9741563634
>> https://mtariq.jux.com/
>>
>>
>> On Mon, Dec 24, 2012 at 4:24 PM, 周梦想 <ab...@gmail.com> wrote:
>>
>>> I stoped the Hadoop, changed every nodes' IP and configured again, and
>>> started Hadoop again. Yes, we did change the IP of NN.
>>>
>>>
>>> 2012/12/24 Nitin Pawar <ni...@gmail.com>
>>>
>>>> what do you mean by this "We changed all IPs of the Hadoop System"
>>>>
>>>> You changed the IPs of the nodes in one go? or you retired nodes one by
>>>> one and changed IPs and brought them back in rotation? Also did you change
>>>> IP of your NN as well ?
>>>>
>>>>
>>>>
>>>> On Mon, Dec 24, 2012 at 4:10 PM, 周梦想 <ab...@gmail.com> wrote:
>>>>
>>>>> Actually the problem was beggining at SecondNameNode. We changed all
>>>>> IPs of the Hadoop System
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Nitin Pawar
>>>>
>>>

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by 周梦想 <ab...@gmail.com>.
thanks Tariq,
Now we are trying to recover data,but some data has lost forever.

the logs just reported NULL Point Exception:


2012-12-17 17:09:05,646 ERROR
org.apache.hadoop.hdfs.server.namenode.NameNode:
java.lang.NullPointerException
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1094)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1106)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1009)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:208)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:626)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1015)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:833)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:372)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:388)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:362)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:276)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:496)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1288)

We changed the source of hadoop to try catch this exception and rebuild it,
then we can start hadoop NN, but the problem of HBase remained.
so we have to upgrade the version of HBase and try to repair HBase Meta
data from Regins data.
Now we are planning to upgrade to stable version of hadoop 1.0.4 and HBase
0.94.3.

Best regards,
Andy

2012/12/24 Mohammad Tariq <do...@gmail.com>

> Hello Andy,
>
>      I hope you are stable now :)
>
> Just a quick question. Did you find anything interesting in the NN, SNN,
> DN logs?
>
> And my grandma says, I look like Abhishek Bachchcan<http://en.wikipedia.org/wiki/Abhishek_Bacchan>;)
>
> Best Regards,
> Tariq
> +91-9741563634
> https://mtariq.jux.com/
>
>
> On Mon, Dec 24, 2012 at 4:24 PM, 周梦想 <ab...@gmail.com> wrote:
>
>> I stoped the Hadoop, changed every nodes' IP and configured again, and
>> started Hadoop again. Yes, we did change the IP of NN.
>>
>>
>> 2012/12/24 Nitin Pawar <ni...@gmail.com>
>>
>>> what do you mean by this "We changed all IPs of the Hadoop System"
>>>
>>> You changed the IPs of the nodes in one go? or you retired nodes one by
>>> one and changed IPs and brought them back in rotation? Also did you change
>>> IP of your NN as well ?
>>>
>>>
>>>
>>> On Mon, Dec 24, 2012 at 4:10 PM, 周梦想 <ab...@gmail.com> wrote:
>>>
>>>> Actually the problem was beggining at SecondNameNode. We changed all
>>>> IPs of the Hadoop System
>>>
>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by 周梦想 <ab...@gmail.com>.
thanks Tariq,
Now we are trying to recover data,but some data has lost forever.

the logs just reported NULL Point Exception:


2012-12-17 17:09:05,646 ERROR
org.apache.hadoop.hdfs.server.namenode.NameNode:
java.lang.NullPointerException
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1094)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1106)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1009)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:208)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:626)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1015)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:833)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:372)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:388)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:362)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:276)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:496)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1288)

We changed the source of hadoop to try catch this exception and rebuild it,
then we can start hadoop NN, but the problem of HBase remained.
so we have to upgrade the version of HBase and try to repair HBase Meta
data from Regins data.
Now we are planning to upgrade to stable version of hadoop 1.0.4 and HBase
0.94.3.

Best regards,
Andy

2012/12/24 Mohammad Tariq <do...@gmail.com>

> Hello Andy,
>
>      I hope you are stable now :)
>
> Just a quick question. Did you find anything interesting in the NN, SNN,
> DN logs?
>
> And my grandma says, I look like Abhishek Bachchcan<http://en.wikipedia.org/wiki/Abhishek_Bacchan>;)
>
> Best Regards,
> Tariq
> +91-9741563634
> https://mtariq.jux.com/
>
>
> On Mon, Dec 24, 2012 at 4:24 PM, 周梦想 <ab...@gmail.com> wrote:
>
>> I stoped the Hadoop, changed every nodes' IP and configured again, and
>> started Hadoop again. Yes, we did change the IP of NN.
>>
>>
>> 2012/12/24 Nitin Pawar <ni...@gmail.com>
>>
>>> what do you mean by this "We changed all IPs of the Hadoop System"
>>>
>>> You changed the IPs of the nodes in one go? or you retired nodes one by
>>> one and changed IPs and brought them back in rotation? Also did you change
>>> IP of your NN as well ?
>>>
>>>
>>>
>>> On Mon, Dec 24, 2012 at 4:10 PM, 周梦想 <ab...@gmail.com> wrote:
>>>
>>>> Actually the problem was beggining at SecondNameNode. We changed all
>>>> IPs of the Hadoop System
>>>
>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by 周梦想 <ab...@gmail.com>.
thanks Tariq,
Now we are trying to recover data,but some data has lost forever.

the logs just reported NULL Point Exception:


2012-12-17 17:09:05,646 ERROR
org.apache.hadoop.hdfs.server.namenode.NameNode:
java.lang.NullPointerException
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1094)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1106)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1009)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:208)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:626)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1015)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:833)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:372)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:388)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:362)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:276)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:496)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1288)

We changed the source of hadoop to try catch this exception and rebuild it,
then we can start hadoop NN, but the problem of HBase remained.
so we have to upgrade the version of HBase and try to repair HBase Meta
data from Regins data.
Now we are planning to upgrade to stable version of hadoop 1.0.4 and HBase
0.94.3.

Best regards,
Andy

2012/12/24 Mohammad Tariq <do...@gmail.com>

> Hello Andy,
>
>      I hope you are stable now :)
>
> Just a quick question. Did you find anything interesting in the NN, SNN,
> DN logs?
>
> And my grandma says, I look like Abhishek Bachchcan<http://en.wikipedia.org/wiki/Abhishek_Bacchan>;)
>
> Best Regards,
> Tariq
> +91-9741563634
> https://mtariq.jux.com/
>
>
> On Mon, Dec 24, 2012 at 4:24 PM, 周梦想 <ab...@gmail.com> wrote:
>
>> I stoped the Hadoop, changed every nodes' IP and configured again, and
>> started Hadoop again. Yes, we did change the IP of NN.
>>
>>
>> 2012/12/24 Nitin Pawar <ni...@gmail.com>
>>
>>> what do you mean by this "We changed all IPs of the Hadoop System"
>>>
>>> You changed the IPs of the nodes in one go? or you retired nodes one by
>>> one and changed IPs and brought them back in rotation? Also did you change
>>> IP of your NN as well ?
>>>
>>>
>>>
>>> On Mon, Dec 24, 2012 at 4:10 PM, 周梦想 <ab...@gmail.com> wrote:
>>>
>>>> Actually the problem was beggining at SecondNameNode. We changed all
>>>> IPs of the Hadoop System
>>>
>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by 周梦想 <ab...@gmail.com>.
thanks Tariq,
Now we are trying to recover data,but some data has lost forever.

the logs just reported NULL Point Exception:


2012-12-17 17:09:05,646 ERROR
org.apache.hadoop.hdfs.server.namenode.NameNode:
java.lang.NullPointerException
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1094)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1106)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1009)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:208)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:626)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1015)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:833)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:372)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:388)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:362)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:276)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:496)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1288)

We changed the source of hadoop to try catch this exception and rebuild it,
then we can start hadoop NN, but the problem of HBase remained.
so we have to upgrade the version of HBase and try to repair HBase Meta
data from Regins data.
Now we are planning to upgrade to stable version of hadoop 1.0.4 and HBase
0.94.3.

Best regards,
Andy

2012/12/24 Mohammad Tariq <do...@gmail.com>

> Hello Andy,
>
>      I hope you are stable now :)
>
> Just a quick question. Did you find anything interesting in the NN, SNN,
> DN logs?
>
> And my grandma says, I look like Abhishek Bachchcan<http://en.wikipedia.org/wiki/Abhishek_Bacchan>;)
>
> Best Regards,
> Tariq
> +91-9741563634
> https://mtariq.jux.com/
>
>
> On Mon, Dec 24, 2012 at 4:24 PM, 周梦想 <ab...@gmail.com> wrote:
>
>> I stoped the Hadoop, changed every nodes' IP and configured again, and
>> started Hadoop again. Yes, we did change the IP of NN.
>>
>>
>> 2012/12/24 Nitin Pawar <ni...@gmail.com>
>>
>>> what do you mean by this "We changed all IPs of the Hadoop System"
>>>
>>> You changed the IPs of the nodes in one go? or you retired nodes one by
>>> one and changed IPs and brought them back in rotation? Also did you change
>>> IP of your NN as well ?
>>>
>>>
>>>
>>> On Mon, Dec 24, 2012 at 4:10 PM, 周梦想 <ab...@gmail.com> wrote:
>>>
>>>> Actually the problem was beggining at SecondNameNode. We changed all
>>>> IPs of the Hadoop System
>>>
>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by Mohammad Tariq <do...@gmail.com>.
Hello Andy,

     I hope you are stable now :)

Just a quick question. Did you find anything interesting in the NN, SNN, DN
logs?

And my grandma says, I look like Abhishek
Bachchcan<http://en.wikipedia.org/wiki/Abhishek_Bacchan>;)

Best Regards,
Tariq
+91-9741563634
https://mtariq.jux.com/


On Mon, Dec 24, 2012 at 4:24 PM, 周梦想 <ab...@gmail.com> wrote:

> I stoped the Hadoop, changed every nodes' IP and configured again, and
> started Hadoop again. Yes, we did change the IP of NN.
>
>
> 2012/12/24 Nitin Pawar <ni...@gmail.com>
>
>> what do you mean by this "We changed all IPs of the Hadoop System"
>>
>> You changed the IPs of the nodes in one go? or you retired nodes one by
>> one and changed IPs and brought them back in rotation? Also did you change
>> IP of your NN as well ?
>>
>>
>>
>> On Mon, Dec 24, 2012 at 4:10 PM, 周梦想 <ab...@gmail.com> wrote:
>>
>>> Actually the problem was beggining at SecondNameNode. We changed all IPs
>>> of the Hadoop System
>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by Mohammad Tariq <do...@gmail.com>.
Hello Andy,

     I hope you are stable now :)

Just a quick question. Did you find anything interesting in the NN, SNN, DN
logs?

And my grandma says, I look like Abhishek
Bachchcan<http://en.wikipedia.org/wiki/Abhishek_Bacchan>;)

Best Regards,
Tariq
+91-9741563634
https://mtariq.jux.com/


On Mon, Dec 24, 2012 at 4:24 PM, 周梦想 <ab...@gmail.com> wrote:

> I stoped the Hadoop, changed every nodes' IP and configured again, and
> started Hadoop again. Yes, we did change the IP of NN.
>
>
> 2012/12/24 Nitin Pawar <ni...@gmail.com>
>
>> what do you mean by this "We changed all IPs of the Hadoop System"
>>
>> You changed the IPs of the nodes in one go? or you retired nodes one by
>> one and changed IPs and brought them back in rotation? Also did you change
>> IP of your NN as well ?
>>
>>
>>
>> On Mon, Dec 24, 2012 at 4:10 PM, 周梦想 <ab...@gmail.com> wrote:
>>
>>> Actually the problem was beggining at SecondNameNode. We changed all IPs
>>> of the Hadoop System
>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by Mohammad Tariq <do...@gmail.com>.
Hello Andy,

     I hope you are stable now :)

Just a quick question. Did you find anything interesting in the NN, SNN, DN
logs?

And my grandma says, I look like Abhishek
Bachchcan<http://en.wikipedia.org/wiki/Abhishek_Bacchan>;)

Best Regards,
Tariq
+91-9741563634
https://mtariq.jux.com/


On Mon, Dec 24, 2012 at 4:24 PM, 周梦想 <ab...@gmail.com> wrote:

> I stoped the Hadoop, changed every nodes' IP and configured again, and
> started Hadoop again. Yes, we did change the IP of NN.
>
>
> 2012/12/24 Nitin Pawar <ni...@gmail.com>
>
>> what do you mean by this "We changed all IPs of the Hadoop System"
>>
>> You changed the IPs of the nodes in one go? or you retired nodes one by
>> one and changed IPs and brought them back in rotation? Also did you change
>> IP of your NN as well ?
>>
>>
>>
>> On Mon, Dec 24, 2012 at 4:10 PM, 周梦想 <ab...@gmail.com> wrote:
>>
>>> Actually the problem was beggining at SecondNameNode. We changed all IPs
>>> of the Hadoop System
>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by Mohammad Tariq <do...@gmail.com>.
Hello Andy,

     I hope you are stable now :)

Just a quick question. Did you find anything interesting in the NN, SNN, DN
logs?

And my grandma says, I look like Abhishek
Bachchcan<http://en.wikipedia.org/wiki/Abhishek_Bacchan>;)

Best Regards,
Tariq
+91-9741563634
https://mtariq.jux.com/


On Mon, Dec 24, 2012 at 4:24 PM, 周梦想 <ab...@gmail.com> wrote:

> I stoped the Hadoop, changed every nodes' IP and configured again, and
> started Hadoop again. Yes, we did change the IP of NN.
>
>
> 2012/12/24 Nitin Pawar <ni...@gmail.com>
>
>> what do you mean by this "We changed all IPs of the Hadoop System"
>>
>> You changed the IPs of the nodes in one go? or you retired nodes one by
>> one and changed IPs and brought them back in rotation? Also did you change
>> IP of your NN as well ?
>>
>>
>>
>> On Mon, Dec 24, 2012 at 4:10 PM, 周梦想 <ab...@gmail.com> wrote:
>>
>>> Actually the problem was beggining at SecondNameNode. We changed all IPs
>>> of the Hadoop System
>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by 周梦想 <ab...@gmail.com>.
I stoped the Hadoop, changed every nodes' IP and configured again, and
started Hadoop again. Yes, we did change the IP of NN.

2012/12/24 Nitin Pawar <ni...@gmail.com>

> what do you mean by this "We changed all IPs of the Hadoop System"
>
> You changed the IPs of the nodes in one go? or you retired nodes one by
> one and changed IPs and brought them back in rotation? Also did you change
> IP of your NN as well ?
>
>
>
> On Mon, Dec 24, 2012 at 4:10 PM, 周梦想 <ab...@gmail.com> wrote:
>
>> Actually the problem was beggining at SecondNameNode. We changed all IPs
>> of the Hadoop System
>
>
>
>
> --
> Nitin Pawar
>

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by 周梦想 <ab...@gmail.com>.
I stoped the Hadoop, changed every nodes' IP and configured again, and
started Hadoop again. Yes, we did change the IP of NN.

2012/12/24 Nitin Pawar <ni...@gmail.com>

> what do you mean by this "We changed all IPs of the Hadoop System"
>
> You changed the IPs of the nodes in one go? or you retired nodes one by
> one and changed IPs and brought them back in rotation? Also did you change
> IP of your NN as well ?
>
>
>
> On Mon, Dec 24, 2012 at 4:10 PM, 周梦想 <ab...@gmail.com> wrote:
>
>> Actually the problem was beggining at SecondNameNode. We changed all IPs
>> of the Hadoop System
>
>
>
>
> --
> Nitin Pawar
>

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by 周梦想 <ab...@gmail.com>.
I stoped the Hadoop, changed every nodes' IP and configured again, and
started Hadoop again. Yes, we did change the IP of NN.

2012/12/24 Nitin Pawar <ni...@gmail.com>

> what do you mean by this "We changed all IPs of the Hadoop System"
>
> You changed the IPs of the nodes in one go? or you retired nodes one by
> one and changed IPs and brought them back in rotation? Also did you change
> IP of your NN as well ?
>
>
>
> On Mon, Dec 24, 2012 at 4:10 PM, 周梦想 <ab...@gmail.com> wrote:
>
>> Actually the problem was beggining at SecondNameNode. We changed all IPs
>> of the Hadoop System
>
>
>
>
> --
> Nitin Pawar
>

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by 周梦想 <ab...@gmail.com>.
I stoped the Hadoop, changed every nodes' IP and configured again, and
started Hadoop again. Yes, we did change the IP of NN.

2012/12/24 Nitin Pawar <ni...@gmail.com>

> what do you mean by this "We changed all IPs of the Hadoop System"
>
> You changed the IPs of the nodes in one go? or you retired nodes one by
> one and changed IPs and brought them back in rotation? Also did you change
> IP of your NN as well ?
>
>
>
> On Mon, Dec 24, 2012 at 4:10 PM, 周梦想 <ab...@gmail.com> wrote:
>
>> Actually the problem was beggining at SecondNameNode. We changed all IPs
>> of the Hadoop System
>
>
>
>
> --
> Nitin Pawar
>

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by Nitin Pawar <ni...@gmail.com>.
what do you mean by this "We changed all IPs of the Hadoop System"

You changed the IPs of the nodes in one go? or you retired nodes one by one
and changed IPs and brought them back in rotation? Also did you change IP
of your NN as well ?



On Mon, Dec 24, 2012 at 4:10 PM, 周梦想 <ab...@gmail.com> wrote:

> Actually the problem was beggining at SecondNameNode. We changed all IPs
> of the Hadoop System




-- 
Nitin Pawar

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by Nitin Pawar <ni...@gmail.com>.
what do you mean by this "We changed all IPs of the Hadoop System"

You changed the IPs of the nodes in one go? or you retired nodes one by one
and changed IPs and brought them back in rotation? Also did you change IP
of your NN as well ?



On Mon, Dec 24, 2012 at 4:10 PM, 周梦想 <ab...@gmail.com> wrote:

> Actually the problem was beggining at SecondNameNode. We changed all IPs
> of the Hadoop System




-- 
Nitin Pawar

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by Nitin Pawar <ni...@gmail.com>.
what do you mean by this "We changed all IPs of the Hadoop System"

You changed the IPs of the nodes in one go? or you retired nodes one by one
and changed IPs and brought them back in rotation? Also did you change IP
of your NN as well ?



On Mon, Dec 24, 2012 at 4:10 PM, 周梦想 <ab...@gmail.com> wrote:

> Actually the problem was beggining at SecondNameNode. We changed all IPs
> of the Hadoop System




-- 
Nitin Pawar

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by Nitin Pawar <ni...@gmail.com>.
what do you mean by this "We changed all IPs of the Hadoop System"

You changed the IPs of the nodes in one go? or you retired nodes one by one
and changed IPs and brought them back in rotation? Also did you change IP
of your NN as well ?



On Mon, Dec 24, 2012 at 4:10 PM, 周梦想 <ab...@gmail.com> wrote:

> Actually the problem was beggining at SecondNameNode. We changed all IPs
> of the Hadoop System




-- 
Nitin Pawar

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by 周梦想 <ab...@gmail.com>.
Thanks to Harsh and Mohammad. Because of data crash, I got ill,so reply
late...

2012/12/20 Harsh J <ha...@cloudera.com>

> Hi,
>
> On Thu, Dec 20, 2012 at 3:18 PM, 周梦想 <ab...@gmail.com> wrote:
> > Some reasons lead to my name node data error, but the error data also
> > overwrite the second name node data, also the NFS backup. I want to
> recover
> > the name node data a day ago or even a week ago,but I can't.
>
> The SecondaryNameNode does this, and that is also why it is
> recommended to run. In HA HDFS, the StandbyNameNode does the same
> action of checkpoints as SecondaryNameNode, to achieve the same
> periodic goal.
>

Actually the problem was beggining at SecondNameNode. We changed all IPs of
the Hadoop System. It runs ok for about 2 hours. Then my monitor script
sent me an email that SNN exited. And it couldn't be started again, every
time it report a NULL Exception. So we try to stop all hadoop system and
start again. But unfortunately, this time even NN could start and reported
the same error.
After that we tried several ways, but it never work, including import
checkpoint from SNN. we found that every copy of NameNode is error.Then we
removed the edits.new and reset edits file, the NN started ok, While HBase
began complain that could not find blocks, even the .META. table has error.
hbck reports many blocks error.

We wanted to change the IPs to old ones, but the problems  still remain.
 We even can't roll back to the old NN data before changed IPs.


> This form of corruption at the SNN too should *never* occur normally,
> and your SNN last-checkpoint-time should be actively monitored to not
> grow too old (a sign of issues). Your version of Hadoop probably is
> still affected by https://issues.apache.org/jira/browse/HDFS-3652 and
> you should update to avoid loss due to it?
>
> Also, if you ever suspect a local copy of NN to be bad, save its
> namespace (hadoop dfsadmin -saveNamespace, requires NN be put in
> safemode first) before you bring it down - this saves a copy from the
> memory onto the disk.
>
> > I have to back
> > up name node data manually or write a bash script to backup it? why
>  hadoop
> > does not give a configure to   backup name node data to local disk daily
> or
> > hourly with different time stamp name?
>
> If the NN's disk itself is corrupt, backing it up would be no good
> either, so this solution vs. SNN still doesn't solve anything of your
> original issue.
>

NN and SNN   design just avoid that one machine corrupt,but it can't
rollback for a period.
if some reason import errors to NN, and spread to SNN, we have  some quick
ways to recover?


> > The same question is to HBase's .META. and -ROOT- table. I think it's
> > history storage is more important 100  times than the log history.
>
> The HBase .META. and -ROOT- are already on HDFS, so are pretty
> reliable (with HBase's WAL and 3x replication of blocks).
>

Just because of Hadoop NN's problem, HBase can't find its tables and data.


> > I think it could be implemented in Second Name Node/Check Points Node or
> > Back Node. Now I do this just using bash script.
>
> I don't think using a bash script to backup the metadata is a better
> solution than relying on the SecondaryNameNode. Two reasons: It does
> the same form of a copy-backup (no validation like SNN does), and it
> does not checkpoint (i.e. merge the edits into the fsimage).
>

I'm using SNN too, but I'm fear of NameNode  and SNN data corrupt.

Thanks!

Andy zhou

>
> --
> Harsh J
>

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by 周梦想 <ab...@gmail.com>.
Thanks to Harsh and Mohammad. Because of data crash, I got ill,so reply
late...

2012/12/20 Harsh J <ha...@cloudera.com>

> Hi,
>
> On Thu, Dec 20, 2012 at 3:18 PM, 周梦想 <ab...@gmail.com> wrote:
> > Some reasons lead to my name node data error, but the error data also
> > overwrite the second name node data, also the NFS backup. I want to
> recover
> > the name node data a day ago or even a week ago,but I can't.
>
> The SecondaryNameNode does this, and that is also why it is
> recommended to run. In HA HDFS, the StandbyNameNode does the same
> action of checkpoints as SecondaryNameNode, to achieve the same
> periodic goal.
>

Actually the problem was beggining at SecondNameNode. We changed all IPs of
the Hadoop System. It runs ok for about 2 hours. Then my monitor script
sent me an email that SNN exited. And it couldn't be started again, every
time it report a NULL Exception. So we try to stop all hadoop system and
start again. But unfortunately, this time even NN could start and reported
the same error.
After that we tried several ways, but it never work, including import
checkpoint from SNN. we found that every copy of NameNode is error.Then we
removed the edits.new and reset edits file, the NN started ok, While HBase
began complain that could not find blocks, even the .META. table has error.
hbck reports many blocks error.

We wanted to change the IPs to old ones, but the problems  still remain.
 We even can't roll back to the old NN data before changed IPs.


> This form of corruption at the SNN too should *never* occur normally,
> and your SNN last-checkpoint-time should be actively monitored to not
> grow too old (a sign of issues). Your version of Hadoop probably is
> still affected by https://issues.apache.org/jira/browse/HDFS-3652 and
> you should update to avoid loss due to it?
>
> Also, if you ever suspect a local copy of NN to be bad, save its
> namespace (hadoop dfsadmin -saveNamespace, requires NN be put in
> safemode first) before you bring it down - this saves a copy from the
> memory onto the disk.
>
> > I have to back
> > up name node data manually or write a bash script to backup it? why
>  hadoop
> > does not give a configure to   backup name node data to local disk daily
> or
> > hourly with different time stamp name?
>
> If the NN's disk itself is corrupt, backing it up would be no good
> either, so this solution vs. SNN still doesn't solve anything of your
> original issue.
>

NN and SNN   design just avoid that one machine corrupt,but it can't
rollback for a period.
if some reason import errors to NN, and spread to SNN, we have  some quick
ways to recover?


> > The same question is to HBase's .META. and -ROOT- table. I think it's
> > history storage is more important 100  times than the log history.
>
> The HBase .META. and -ROOT- are already on HDFS, so are pretty
> reliable (with HBase's WAL and 3x replication of blocks).
>

Just because of Hadoop NN's problem, HBase can't find its tables and data.


> > I think it could be implemented in Second Name Node/Check Points Node or
> > Back Node. Now I do this just using bash script.
>
> I don't think using a bash script to backup the metadata is a better
> solution than relying on the SecondaryNameNode. Two reasons: It does
> the same form of a copy-backup (no validation like SNN does), and it
> does not checkpoint (i.e. merge the edits into the fsimage).
>

I'm using SNN too, but I'm fear of NameNode  and SNN data corrupt.

Thanks!

Andy zhou

>
> --
> Harsh J
>

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by 周梦想 <ab...@gmail.com>.
Thanks to Harsh and Mohammad. Because of data crash, I got ill,so reply
late...

2012/12/20 Harsh J <ha...@cloudera.com>

> Hi,
>
> On Thu, Dec 20, 2012 at 3:18 PM, 周梦想 <ab...@gmail.com> wrote:
> > Some reasons lead to my name node data error, but the error data also
> > overwrite the second name node data, also the NFS backup. I want to
> recover
> > the name node data a day ago or even a week ago,but I can't.
>
> The SecondaryNameNode does this, and that is also why it is
> recommended to run. In HA HDFS, the StandbyNameNode does the same
> action of checkpoints as SecondaryNameNode, to achieve the same
> periodic goal.
>

Actually the problem was beggining at SecondNameNode. We changed all IPs of
the Hadoop System. It runs ok for about 2 hours. Then my monitor script
sent me an email that SNN exited. And it couldn't be started again, every
time it report a NULL Exception. So we try to stop all hadoop system and
start again. But unfortunately, this time even NN could start and reported
the same error.
After that we tried several ways, but it never work, including import
checkpoint from SNN. we found that every copy of NameNode is error.Then we
removed the edits.new and reset edits file, the NN started ok, While HBase
began complain that could not find blocks, even the .META. table has error.
hbck reports many blocks error.

We wanted to change the IPs to old ones, but the problems  still remain.
 We even can't roll back to the old NN data before changed IPs.


> This form of corruption at the SNN too should *never* occur normally,
> and your SNN last-checkpoint-time should be actively monitored to not
> grow too old (a sign of issues). Your version of Hadoop probably is
> still affected by https://issues.apache.org/jira/browse/HDFS-3652 and
> you should update to avoid loss due to it?
>
> Also, if you ever suspect a local copy of NN to be bad, save its
> namespace (hadoop dfsadmin -saveNamespace, requires NN be put in
> safemode first) before you bring it down - this saves a copy from the
> memory onto the disk.
>
> > I have to back
> > up name node data manually or write a bash script to backup it? why
>  hadoop
> > does not give a configure to   backup name node data to local disk daily
> or
> > hourly with different time stamp name?
>
> If the NN's disk itself is corrupt, backing it up would be no good
> either, so this solution vs. SNN still doesn't solve anything of your
> original issue.
>

NN and SNN   design just avoid that one machine corrupt,but it can't
rollback for a period.
if some reason import errors to NN, and spread to SNN, we have  some quick
ways to recover?


> > The same question is to HBase's .META. and -ROOT- table. I think it's
> > history storage is more important 100  times than the log history.
>
> The HBase .META. and -ROOT- are already on HDFS, so are pretty
> reliable (with HBase's WAL and 3x replication of blocks).
>

Just because of Hadoop NN's problem, HBase can't find its tables and data.


> > I think it could be implemented in Second Name Node/Check Points Node or
> > Back Node. Now I do this just using bash script.
>
> I don't think using a bash script to backup the metadata is a better
> solution than relying on the SecondaryNameNode. Two reasons: It does
> the same form of a copy-backup (no validation like SNN does), and it
> does not checkpoint (i.e. merge the edits into the fsimage).
>

I'm using SNN too, but I'm fear of NameNode  and SNN data corrupt.

Thanks!

Andy zhou

>
> --
> Harsh J
>

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by 周梦想 <ab...@gmail.com>.
Thanks to Harsh and Mohammad. Because of data crash, I got ill,so reply
late...

2012/12/20 Harsh J <ha...@cloudera.com>

> Hi,
>
> On Thu, Dec 20, 2012 at 3:18 PM, 周梦想 <ab...@gmail.com> wrote:
> > Some reasons lead to my name node data error, but the error data also
> > overwrite the second name node data, also the NFS backup. I want to
> recover
> > the name node data a day ago or even a week ago,but I can't.
>
> The SecondaryNameNode does this, and that is also why it is
> recommended to run. In HA HDFS, the StandbyNameNode does the same
> action of checkpoints as SecondaryNameNode, to achieve the same
> periodic goal.
>

Actually the problem was beggining at SecondNameNode. We changed all IPs of
the Hadoop System. It runs ok for about 2 hours. Then my monitor script
sent me an email that SNN exited. And it couldn't be started again, every
time it report a NULL Exception. So we try to stop all hadoop system and
start again. But unfortunately, this time even NN could start and reported
the same error.
After that we tried several ways, but it never work, including import
checkpoint from SNN. we found that every copy of NameNode is error.Then we
removed the edits.new and reset edits file, the NN started ok, While HBase
began complain that could not find blocks, even the .META. table has error.
hbck reports many blocks error.

We wanted to change the IPs to old ones, but the problems  still remain.
 We even can't roll back to the old NN data before changed IPs.


> This form of corruption at the SNN too should *never* occur normally,
> and your SNN last-checkpoint-time should be actively monitored to not
> grow too old (a sign of issues). Your version of Hadoop probably is
> still affected by https://issues.apache.org/jira/browse/HDFS-3652 and
> you should update to avoid loss due to it?
>
> Also, if you ever suspect a local copy of NN to be bad, save its
> namespace (hadoop dfsadmin -saveNamespace, requires NN be put in
> safemode first) before you bring it down - this saves a copy from the
> memory onto the disk.
>
> > I have to back
> > up name node data manually or write a bash script to backup it? why
>  hadoop
> > does not give a configure to   backup name node data to local disk daily
> or
> > hourly with different time stamp name?
>
> If the NN's disk itself is corrupt, backing it up would be no good
> either, so this solution vs. SNN still doesn't solve anything of your
> original issue.
>

NN and SNN   design just avoid that one machine corrupt,but it can't
rollback for a period.
if some reason import errors to NN, and spread to SNN, we have  some quick
ways to recover?


> > The same question is to HBase's .META. and -ROOT- table. I think it's
> > history storage is more important 100  times than the log history.
>
> The HBase .META. and -ROOT- are already on HDFS, so are pretty
> reliable (with HBase's WAL and 3x replication of blocks).
>

Just because of Hadoop NN's problem, HBase can't find its tables and data.


> > I think it could be implemented in Second Name Node/Check Points Node or
> > Back Node. Now I do this just using bash script.
>
> I don't think using a bash script to backup the metadata is a better
> solution than relying on the SecondaryNameNode. Two reasons: It does
> the same form of a copy-backup (no validation like SNN does), and it
> does not checkpoint (i.e. merge the edits into the fsimage).
>

I'm using SNN too, but I'm fear of NameNode  and SNN data corrupt.

Thanks!

Andy zhou

>
> --
> Harsh J
>

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by Harsh J <ha...@cloudera.com>.
Hi,

On Thu, Dec 20, 2012 at 3:18 PM, 周梦想 <ab...@gmail.com> wrote:
> Some reasons lead to my name node data error, but the error data also
> overwrite the second name node data, also the NFS backup. I want to recover
> the name node data a day ago or even a week ago,but I can't.

The SecondaryNameNode does this, and that is also why it is
recommended to run. In HA HDFS, the StandbyNameNode does the same
action of checkpoints as SecondaryNameNode, to achieve the same
periodic goal.

This form of corruption at the SNN too should *never* occur normally,
and your SNN last-checkpoint-time should be actively monitored to not
grow too old (a sign of issues). Your version of Hadoop probably is
still affected by https://issues.apache.org/jira/browse/HDFS-3652 and
you should update to avoid loss due to it?

Also, if you ever suspect a local copy of NN to be bad, save its
namespace (hadoop dfsadmin -saveNamespace, requires NN be put in
safemode first) before you bring it down - this saves a copy from the
memory onto the disk.

> I have to back
> up name node data manually or write a bash script to backup it? why  hadoop
> does not give a configure to   backup name node data to local disk daily or
> hourly with different time stamp name?

If the NN's disk itself is corrupt, backing it up would be no good
either, so this solution vs. SNN still doesn't solve anything of your
original issue.

> The same question is to HBase's .META. and -ROOT- table. I think it's
> history storage is more important 100  times than the log history.

The HBase .META. and -ROOT- are already on HDFS, so are pretty
reliable (with HBase's WAL and 3x replication of blocks).

> I think it could be implemented in Second Name Node/Check Points Node or
> Back Node. Now I do this just using bash script.

I don't think using a bash script to backup the metadata is a better
solution than relying on the SecondaryNameNode. Two reasons: It does
the same form of a copy-backup (no validation like SNN does), and it
does not checkpoint (i.e. merge the edits into the fsimage).

--
Harsh J

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by Harsh J <ha...@cloudera.com>.
Hi,

On Thu, Dec 20, 2012 at 3:18 PM, 周梦想 <ab...@gmail.com> wrote:
> Some reasons lead to my name node data error, but the error data also
> overwrite the second name node data, also the NFS backup. I want to recover
> the name node data a day ago or even a week ago,but I can't.

The SecondaryNameNode does this, and that is also why it is
recommended to run. In HA HDFS, the StandbyNameNode does the same
action of checkpoints as SecondaryNameNode, to achieve the same
periodic goal.

This form of corruption at the SNN too should *never* occur normally,
and your SNN last-checkpoint-time should be actively monitored to not
grow too old (a sign of issues). Your version of Hadoop probably is
still affected by https://issues.apache.org/jira/browse/HDFS-3652 and
you should update to avoid loss due to it?

Also, if you ever suspect a local copy of NN to be bad, save its
namespace (hadoop dfsadmin -saveNamespace, requires NN be put in
safemode first) before you bring it down - this saves a copy from the
memory onto the disk.

> I have to back
> up name node data manually or write a bash script to backup it? why  hadoop
> does not give a configure to   backup name node data to local disk daily or
> hourly with different time stamp name?

If the NN's disk itself is corrupt, backing it up would be no good
either, so this solution vs. SNN still doesn't solve anything of your
original issue.

> The same question is to HBase's .META. and -ROOT- table. I think it's
> history storage is more important 100  times than the log history.

The HBase .META. and -ROOT- are already on HDFS, so are pretty
reliable (with HBase's WAL and 3x replication of blocks).

> I think it could be implemented in Second Name Node/Check Points Node or
> Back Node. Now I do this just using bash script.

I don't think using a bash script to backup the metadata is a better
solution than relying on the SecondaryNameNode. Two reasons: It does
the same form of a copy-backup (no validation like SNN does), and it
does not checkpoint (i.e. merge the edits into the fsimage).

--
Harsh J

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by 周梦想 <ab...@gmail.com>.
Hi Tariq,
Thanks for your patient. I know that fsimage stores metadata of blocks. I
have three machine to back it, so I don't worry about it lost. I'm using
SNN and NFS to backup NN data file. But as the description above, my
damaged data dirtied every nodes that I backed up automatically.

BTW: you looks like the actor of PI on the movie "lifes of PI":)
Best regards,
Andy Zhou

2012/12/20 Mohammad Tariq <do...@gmail.com>

> Hello Andy,
>
>             NN stores all the metadata in a file called as "fsimage". The
> fsimage file contains a snapshot of the HDFS metadata. Along with fsimage
> NN also holds  "edit log" files. Whenever there is a change to HDFS, it
> gets appended to the edits file. When these log files grow big, they are
> merged together with fsimage file. These files are stored on the local FS
> at the path specified by the "dfs.name.dir" property in "hdfs-site.xml"
> file. To prevent any loss you can give multiple locations as the value for
> this property, say 1 on your local disk and another on a network drive in
> case you HD get crashed you still have the metadata safe with you in that
> network drive.(The condition which you have faced recently)
>
> Now, coming to the SNN. It is a helper node for the NN. SNN periodically
> pulls the fsimage file, which would have grown quite big by now. And the NN
> starts the cycle again. Suppose, you are ruuning completely out of luck and
> loose the entire NN. In such a case you can take his copy of fsimage from
> the SNN and retrieve your metadata back.
>
> HTH
>
> Best Regards,
> Tariq
> +91-9741563634
> https://mtariq.jux.com/
>
>
> On Thu, Dec 20, 2012 at 3:18 PM, 周梦想 <ab...@gmail.com> wrote:
>
>> Some reasons lead to my name node data error, but the error data also
>> overwrite the second name node data, also the NFS backup. I want to recover
>> the name node data a day ago or even a week ago,but I can't. I have to back
>> up name node data manually or write a bash script to backup it? why  hadoop
>> does not give a configure to   backup name node data to local disk daily or
>>  hourly with different time stamp name?
>>
>> The same question is to HBase's .META. and -ROOT- table. I think it's
>> history storage is more important 100  times than the log history.
>>
>> I think it could be implemented in Second Name Node/Check Points Node or
>> Back Node. Now I do this just using bash script.
>>
>> Some one agree with me?
>>
>>
>> Best Regards,
>> Andy Zhou
>>
>
>

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by 周梦想 <ab...@gmail.com>.
Hi Tariq,
Thanks for your patient. I know that fsimage stores metadata of blocks. I
have three machine to back it, so I don't worry about it lost. I'm using
SNN and NFS to backup NN data file. But as the description above, my
damaged data dirtied every nodes that I backed up automatically.

BTW: you looks like the actor of PI on the movie "lifes of PI":)
Best regards,
Andy Zhou

2012/12/20 Mohammad Tariq <do...@gmail.com>

> Hello Andy,
>
>             NN stores all the metadata in a file called as "fsimage". The
> fsimage file contains a snapshot of the HDFS metadata. Along with fsimage
> NN also holds  "edit log" files. Whenever there is a change to HDFS, it
> gets appended to the edits file. When these log files grow big, they are
> merged together with fsimage file. These files are stored on the local FS
> at the path specified by the "dfs.name.dir" property in "hdfs-site.xml"
> file. To prevent any loss you can give multiple locations as the value for
> this property, say 1 on your local disk and another on a network drive in
> case you HD get crashed you still have the metadata safe with you in that
> network drive.(The condition which you have faced recently)
>
> Now, coming to the SNN. It is a helper node for the NN. SNN periodically
> pulls the fsimage file, which would have grown quite big by now. And the NN
> starts the cycle again. Suppose, you are ruuning completely out of luck and
> loose the entire NN. In such a case you can take his copy of fsimage from
> the SNN and retrieve your metadata back.
>
> HTH
>
> Best Regards,
> Tariq
> +91-9741563634
> https://mtariq.jux.com/
>
>
> On Thu, Dec 20, 2012 at 3:18 PM, 周梦想 <ab...@gmail.com> wrote:
>
>> Some reasons lead to my name node data error, but the error data also
>> overwrite the second name node data, also the NFS backup. I want to recover
>> the name node data a day ago or even a week ago,but I can't. I have to back
>> up name node data manually or write a bash script to backup it? why  hadoop
>> does not give a configure to   backup name node data to local disk daily or
>>  hourly with different time stamp name?
>>
>> The same question is to HBase's .META. and -ROOT- table. I think it's
>> history storage is more important 100  times than the log history.
>>
>> I think it could be implemented in Second Name Node/Check Points Node or
>> Back Node. Now I do this just using bash script.
>>
>> Some one agree with me?
>>
>>
>> Best Regards,
>> Andy Zhou
>>
>
>

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by 周梦想 <ab...@gmail.com>.
Hi Tariq,
Thanks for your patient. I know that fsimage stores metadata of blocks. I
have three machine to back it, so I don't worry about it lost. I'm using
SNN and NFS to backup NN data file. But as the description above, my
damaged data dirtied every nodes that I backed up automatically.

BTW: you looks like the actor of PI on the movie "lifes of PI":)
Best regards,
Andy Zhou

2012/12/20 Mohammad Tariq <do...@gmail.com>

> Hello Andy,
>
>             NN stores all the metadata in a file called as "fsimage". The
> fsimage file contains a snapshot of the HDFS metadata. Along with fsimage
> NN also holds  "edit log" files. Whenever there is a change to HDFS, it
> gets appended to the edits file. When these log files grow big, they are
> merged together with fsimage file. These files are stored on the local FS
> at the path specified by the "dfs.name.dir" property in "hdfs-site.xml"
> file. To prevent any loss you can give multiple locations as the value for
> this property, say 1 on your local disk and another on a network drive in
> case you HD get crashed you still have the metadata safe with you in that
> network drive.(The condition which you have faced recently)
>
> Now, coming to the SNN. It is a helper node for the NN. SNN periodically
> pulls the fsimage file, which would have grown quite big by now. And the NN
> starts the cycle again. Suppose, you are ruuning completely out of luck and
> loose the entire NN. In such a case you can take his copy of fsimage from
> the SNN and retrieve your metadata back.
>
> HTH
>
> Best Regards,
> Tariq
> +91-9741563634
> https://mtariq.jux.com/
>
>
> On Thu, Dec 20, 2012 at 3:18 PM, 周梦想 <ab...@gmail.com> wrote:
>
>> Some reasons lead to my name node data error, but the error data also
>> overwrite the second name node data, also the NFS backup. I want to recover
>> the name node data a day ago or even a week ago,but I can't. I have to back
>> up name node data manually or write a bash script to backup it? why  hadoop
>> does not give a configure to   backup name node data to local disk daily or
>>  hourly with different time stamp name?
>>
>> The same question is to HBase's .META. and -ROOT- table. I think it's
>> history storage is more important 100  times than the log history.
>>
>> I think it could be implemented in Second Name Node/Check Points Node or
>> Back Node. Now I do this just using bash script.
>>
>> Some one agree with me?
>>
>>
>> Best Regards,
>> Andy Zhou
>>
>
>

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by 周梦想 <ab...@gmail.com>.
Hi Tariq,
Thanks for your patient. I know that fsimage stores metadata of blocks. I
have three machine to back it, so I don't worry about it lost. I'm using
SNN and NFS to backup NN data file. But as the description above, my
damaged data dirtied every nodes that I backed up automatically.

BTW: you looks like the actor of PI on the movie "lifes of PI":)
Best regards,
Andy Zhou

2012/12/20 Mohammad Tariq <do...@gmail.com>

> Hello Andy,
>
>             NN stores all the metadata in a file called as "fsimage". The
> fsimage file contains a snapshot of the HDFS metadata. Along with fsimage
> NN also holds  "edit log" files. Whenever there is a change to HDFS, it
> gets appended to the edits file. When these log files grow big, they are
> merged together with fsimage file. These files are stored on the local FS
> at the path specified by the "dfs.name.dir" property in "hdfs-site.xml"
> file. To prevent any loss you can give multiple locations as the value for
> this property, say 1 on your local disk and another on a network drive in
> case you HD get crashed you still have the metadata safe with you in that
> network drive.(The condition which you have faced recently)
>
> Now, coming to the SNN. It is a helper node for the NN. SNN periodically
> pulls the fsimage file, which would have grown quite big by now. And the NN
> starts the cycle again. Suppose, you are ruuning completely out of luck and
> loose the entire NN. In such a case you can take his copy of fsimage from
> the SNN and retrieve your metadata back.
>
> HTH
>
> Best Regards,
> Tariq
> +91-9741563634
> https://mtariq.jux.com/
>
>
> On Thu, Dec 20, 2012 at 3:18 PM, 周梦想 <ab...@gmail.com> wrote:
>
>> Some reasons lead to my name node data error, but the error data also
>> overwrite the second name node data, also the NFS backup. I want to recover
>> the name node data a day ago or even a week ago,but I can't. I have to back
>> up name node data manually or write a bash script to backup it? why  hadoop
>> does not give a configure to   backup name node data to local disk daily or
>>  hourly with different time stamp name?
>>
>> The same question is to HBase's .META. and -ROOT- table. I think it's
>> history storage is more important 100  times than the log history.
>>
>> I think it could be implemented in Second Name Node/Check Points Node or
>> Back Node. Now I do this just using bash script.
>>
>> Some one agree with me?
>>
>>
>> Best Regards,
>> Andy Zhou
>>
>
>

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by Mohammad Tariq <do...@gmail.com>.
Ohhhh...This is the benefit of sharing space with you. Thank you so
much for
keeping my knowledge base updated. It' high time, I require a proper re-scan
of everything.

@Andy : Now i'm truly sorry, for passing on the wrong info.

Best Regards,
Tariq
+91-9741563634
https://mtariq.jux.com/


On Thu, Dec 20, 2012 at 4:04 PM, Harsh J <ha...@cloudera.com> wrote:

> Hi Mohammad,
>
> On Thu, Dec 20, 2012 at 3:54 PM, Mohammad Tariq <do...@gmail.com>
> wrote:
> > I am sorry Andy, I forgot one important point.
> >
> >  The Secondary NameNode has been deprecated now, so consider using the
> > Checkpoint Node or Backup Node. Checkpoint Node is the process which is
> > actually responsible for creating periodic check points. It downloads
> > fsimage and log edits from the active NameNode, merges them locally, and
> > uploads the new image back to the active NameNode.
>
> This isn't true anymore. We are continuing to keep the SNN and have
> undeprecated it. See https://issues.apache.org/jira/browse/HDFS-2397.
> We are perhaps deprecating the CheckpointNode though:
> https://issues.apache.org/jira/browse/HDFS-4114.
>
> --
> Harsh J
>

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by Mohammad Tariq <do...@gmail.com>.
Ohhhh...This is the benefit of sharing space with you. Thank you so
much for
keeping my knowledge base updated. It' high time, I require a proper re-scan
of everything.

@Andy : Now i'm truly sorry, for passing on the wrong info.

Best Regards,
Tariq
+91-9741563634
https://mtariq.jux.com/


On Thu, Dec 20, 2012 at 4:04 PM, Harsh J <ha...@cloudera.com> wrote:

> Hi Mohammad,
>
> On Thu, Dec 20, 2012 at 3:54 PM, Mohammad Tariq <do...@gmail.com>
> wrote:
> > I am sorry Andy, I forgot one important point.
> >
> >  The Secondary NameNode has been deprecated now, so consider using the
> > Checkpoint Node or Backup Node. Checkpoint Node is the process which is
> > actually responsible for creating periodic check points. It downloads
> > fsimage and log edits from the active NameNode, merges them locally, and
> > uploads the new image back to the active NameNode.
>
> This isn't true anymore. We are continuing to keep the SNN and have
> undeprecated it. See https://issues.apache.org/jira/browse/HDFS-2397.
> We are perhaps deprecating the CheckpointNode though:
> https://issues.apache.org/jira/browse/HDFS-4114.
>
> --
> Harsh J
>

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by Mohammad Tariq <do...@gmail.com>.
Ohhhh...This is the benefit of sharing space with you. Thank you so
much for
keeping my knowledge base updated. It' high time, I require a proper re-scan
of everything.

@Andy : Now i'm truly sorry, for passing on the wrong info.

Best Regards,
Tariq
+91-9741563634
https://mtariq.jux.com/


On Thu, Dec 20, 2012 at 4:04 PM, Harsh J <ha...@cloudera.com> wrote:

> Hi Mohammad,
>
> On Thu, Dec 20, 2012 at 3:54 PM, Mohammad Tariq <do...@gmail.com>
> wrote:
> > I am sorry Andy, I forgot one important point.
> >
> >  The Secondary NameNode has been deprecated now, so consider using the
> > Checkpoint Node or Backup Node. Checkpoint Node is the process which is
> > actually responsible for creating periodic check points. It downloads
> > fsimage and log edits from the active NameNode, merges them locally, and
> > uploads the new image back to the active NameNode.
>
> This isn't true anymore. We are continuing to keep the SNN and have
> undeprecated it. See https://issues.apache.org/jira/browse/HDFS-2397.
> We are perhaps deprecating the CheckpointNode though:
> https://issues.apache.org/jira/browse/HDFS-4114.
>
> --
> Harsh J
>

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by Mohammad Tariq <do...@gmail.com>.
Ohhhh...This is the benefit of sharing space with you. Thank you so
much for
keeping my knowledge base updated. It' high time, I require a proper re-scan
of everything.

@Andy : Now i'm truly sorry, for passing on the wrong info.

Best Regards,
Tariq
+91-9741563634
https://mtariq.jux.com/


On Thu, Dec 20, 2012 at 4:04 PM, Harsh J <ha...@cloudera.com> wrote:

> Hi Mohammad,
>
> On Thu, Dec 20, 2012 at 3:54 PM, Mohammad Tariq <do...@gmail.com>
> wrote:
> > I am sorry Andy, I forgot one important point.
> >
> >  The Secondary NameNode has been deprecated now, so consider using the
> > Checkpoint Node or Backup Node. Checkpoint Node is the process which is
> > actually responsible for creating periodic check points. It downloads
> > fsimage and log edits from the active NameNode, merges them locally, and
> > uploads the new image back to the active NameNode.
>
> This isn't true anymore. We are continuing to keep the SNN and have
> undeprecated it. See https://issues.apache.org/jira/browse/HDFS-2397.
> We are perhaps deprecating the CheckpointNode though:
> https://issues.apache.org/jira/browse/HDFS-4114.
>
> --
> Harsh J
>

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by Harsh J <ha...@cloudera.com>.
Hi Mohammad,

On Thu, Dec 20, 2012 at 3:54 PM, Mohammad Tariq <do...@gmail.com> wrote:
> I am sorry Andy, I forgot one important point.
>
>  The Secondary NameNode has been deprecated now, so consider using the
> Checkpoint Node or Backup Node. Checkpoint Node is the process which is
> actually responsible for creating periodic check points. It downloads
> fsimage and log edits from the active NameNode, merges them locally, and
> uploads the new image back to the active NameNode.

This isn't true anymore. We are continuing to keep the SNN and have
undeprecated it. See https://issues.apache.org/jira/browse/HDFS-2397.
We are perhaps deprecating the CheckpointNode though:
https://issues.apache.org/jira/browse/HDFS-4114.

--
Harsh J

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by Harsh J <ha...@cloudera.com>.
Hi Mohammad,

On Thu, Dec 20, 2012 at 3:54 PM, Mohammad Tariq <do...@gmail.com> wrote:
> I am sorry Andy, I forgot one important point.
>
>  The Secondary NameNode has been deprecated now, so consider using the
> Checkpoint Node or Backup Node. Checkpoint Node is the process which is
> actually responsible for creating periodic check points. It downloads
> fsimage and log edits from the active NameNode, merges them locally, and
> uploads the new image back to the active NameNode.

This isn't true anymore. We are continuing to keep the SNN and have
undeprecated it. See https://issues.apache.org/jira/browse/HDFS-2397.
We are perhaps deprecating the CheckpointNode though:
https://issues.apache.org/jira/browse/HDFS-4114.

--
Harsh J

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by Harsh J <ha...@cloudera.com>.
Hi Mohammad,

On Thu, Dec 20, 2012 at 3:54 PM, Mohammad Tariq <do...@gmail.com> wrote:
> I am sorry Andy, I forgot one important point.
>
>  The Secondary NameNode has been deprecated now, so consider using the
> Checkpoint Node or Backup Node. Checkpoint Node is the process which is
> actually responsible for creating periodic check points. It downloads
> fsimage and log edits from the active NameNode, merges them locally, and
> uploads the new image back to the active NameNode.

This isn't true anymore. We are continuing to keep the SNN and have
undeprecated it. See https://issues.apache.org/jira/browse/HDFS-2397.
We are perhaps deprecating the CheckpointNode though:
https://issues.apache.org/jira/browse/HDFS-4114.

--
Harsh J

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by Harsh J <ha...@cloudera.com>.
Hi Mohammad,

On Thu, Dec 20, 2012 at 3:54 PM, Mohammad Tariq <do...@gmail.com> wrote:
> I am sorry Andy, I forgot one important point.
>
>  The Secondary NameNode has been deprecated now, so consider using the
> Checkpoint Node or Backup Node. Checkpoint Node is the process which is
> actually responsible for creating periodic check points. It downloads
> fsimage and log edits from the active NameNode, merges them locally, and
> uploads the new image back to the active NameNode.

This isn't true anymore. We are continuing to keep the SNN and have
undeprecated it. See https://issues.apache.org/jira/browse/HDFS-2397.
We are perhaps deprecating the CheckpointNode though:
https://issues.apache.org/jira/browse/HDFS-4114.

--
Harsh J

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by Mohammad Tariq <do...@gmail.com>.
I am sorry Andy, I forgot one important point.

 The Secondary NameNode has been deprecated now, so consider using the
Checkpoint Node or Backup Node. Checkpoint Node is the process which is
actually responsible for creating periodic check points. It downloads
fsimage and log edits from the active NameNode, merges them locally, and
uploads the new image back to the active NameNode.

*It is advisable to run the Checkpoint Node on a different machine as it
consumes almost equal amount of memory as that of NameNode.

You can start Checkpoint Node by using "bin/hdfs namenode -checkpoint"
command.

The default value of the maximum delay between two consecutive checkpoints
is 1 hour (which is exactly what you want, right???). But you can configure
it as per your requirements through "dfs.namenode.checkpoint.period".

HTH

Best Regards,
Tariq
+91-9741563634
https://mtariq.jux.com/


On Thu, Dec 20, 2012 at 3:38 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello Andy,
>
>             NN stores all the metadata in a file called as "fsimage". The
> fsimage file contains a snapshot of the HDFS metadata. Along with fsimage
> NN also holds  "edit log" files. Whenever there is a change to HDFS, it
> gets appended to the edits file. When these log files grow big, they are
> merged together with fsimage file. These files are stored on the local FS
> at the path specified by the "dfs.name.dir" property in "hdfs-site.xml"
> file. To prevent any loss you can give multiple locations as the value for
> this property, say 1 on your local disk and another on a network drive in
> case you HD get crashed you still have the metadata safe with you in that
> network drive.(The condition which you have faced recently)
>
> Now, coming to the SNN. It is a helper node for the NN. SNN periodically
> pulls the fsimage file, which would have grown quite big by now. And the NN
> starts the cycle again. Suppose, you are ruuning completely out of luck and
> loose the entire NN. In such a case you can take his copy of fsimage from
> the SNN and retrieve your metadata back.
>
> HTH
>
> Best Regards,
> Tariq
> +91-9741563634
> https://mtariq.jux.com/
>
>
> On Thu, Dec 20, 2012 at 3:18 PM, 周梦想 <ab...@gmail.com> wrote:
>
>> Some reasons lead to my name node data error, but the error data also
>> overwrite the second name node data, also the NFS backup. I want to recover
>> the name node data a day ago or even a week ago,but I can't. I have to back
>> up name node data manually or write a bash script to backup it? why  hadoop
>> does not give a configure to   backup name node data to local disk daily or
>>  hourly with different time stamp name?
>>
>> The same question is to HBase's .META. and -ROOT- table. I think it's
>> history storage is more important 100  times than the log history.
>>
>> I think it could be implemented in Second Name Node/Check Points Node or
>> Back Node. Now I do this just using bash script.
>>
>> Some one agree with me?
>>
>>
>> Best Regards,
>> Andy Zhou
>>
>
>

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by Mohammad Tariq <do...@gmail.com>.
I am sorry Andy, I forgot one important point.

 The Secondary NameNode has been deprecated now, so consider using the
Checkpoint Node or Backup Node. Checkpoint Node is the process which is
actually responsible for creating periodic check points. It downloads
fsimage and log edits from the active NameNode, merges them locally, and
uploads the new image back to the active NameNode.

*It is advisable to run the Checkpoint Node on a different machine as it
consumes almost equal amount of memory as that of NameNode.

You can start Checkpoint Node by using "bin/hdfs namenode -checkpoint"
command.

The default value of the maximum delay between two consecutive checkpoints
is 1 hour (which is exactly what you want, right???). But you can configure
it as per your requirements through "dfs.namenode.checkpoint.period".

HTH

Best Regards,
Tariq
+91-9741563634
https://mtariq.jux.com/


On Thu, Dec 20, 2012 at 3:38 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello Andy,
>
>             NN stores all the metadata in a file called as "fsimage". The
> fsimage file contains a snapshot of the HDFS metadata. Along with fsimage
> NN also holds  "edit log" files. Whenever there is a change to HDFS, it
> gets appended to the edits file. When these log files grow big, they are
> merged together with fsimage file. These files are stored on the local FS
> at the path specified by the "dfs.name.dir" property in "hdfs-site.xml"
> file. To prevent any loss you can give multiple locations as the value for
> this property, say 1 on your local disk and another on a network drive in
> case you HD get crashed you still have the metadata safe with you in that
> network drive.(The condition which you have faced recently)
>
> Now, coming to the SNN. It is a helper node for the NN. SNN periodically
> pulls the fsimage file, which would have grown quite big by now. And the NN
> starts the cycle again. Suppose, you are ruuning completely out of luck and
> loose the entire NN. In such a case you can take his copy of fsimage from
> the SNN and retrieve your metadata back.
>
> HTH
>
> Best Regards,
> Tariq
> +91-9741563634
> https://mtariq.jux.com/
>
>
> On Thu, Dec 20, 2012 at 3:18 PM, 周梦想 <ab...@gmail.com> wrote:
>
>> Some reasons lead to my name node data error, but the error data also
>> overwrite the second name node data, also the NFS backup. I want to recover
>> the name node data a day ago or even a week ago,but I can't. I have to back
>> up name node data manually or write a bash script to backup it? why  hadoop
>> does not give a configure to   backup name node data to local disk daily or
>>  hourly with different time stamp name?
>>
>> The same question is to HBase's .META. and -ROOT- table. I think it's
>> history storage is more important 100  times than the log history.
>>
>> I think it could be implemented in Second Name Node/Check Points Node or
>> Back Node. Now I do this just using bash script.
>>
>> Some one agree with me?
>>
>>
>> Best Regards,
>> Andy Zhou
>>
>
>

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by Mohammad Tariq <do...@gmail.com>.
I am sorry Andy, I forgot one important point.

 The Secondary NameNode has been deprecated now, so consider using the
Checkpoint Node or Backup Node. Checkpoint Node is the process which is
actually responsible for creating periodic check points. It downloads
fsimage and log edits from the active NameNode, merges them locally, and
uploads the new image back to the active NameNode.

*It is advisable to run the Checkpoint Node on a different machine as it
consumes almost equal amount of memory as that of NameNode.

You can start Checkpoint Node by using "bin/hdfs namenode -checkpoint"
command.

The default value of the maximum delay between two consecutive checkpoints
is 1 hour (which is exactly what you want, right???). But you can configure
it as per your requirements through "dfs.namenode.checkpoint.period".

HTH

Best Regards,
Tariq
+91-9741563634
https://mtariq.jux.com/


On Thu, Dec 20, 2012 at 3:38 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello Andy,
>
>             NN stores all the metadata in a file called as "fsimage". The
> fsimage file contains a snapshot of the HDFS metadata. Along with fsimage
> NN also holds  "edit log" files. Whenever there is a change to HDFS, it
> gets appended to the edits file. When these log files grow big, they are
> merged together with fsimage file. These files are stored on the local FS
> at the path specified by the "dfs.name.dir" property in "hdfs-site.xml"
> file. To prevent any loss you can give multiple locations as the value for
> this property, say 1 on your local disk and another on a network drive in
> case you HD get crashed you still have the metadata safe with you in that
> network drive.(The condition which you have faced recently)
>
> Now, coming to the SNN. It is a helper node for the NN. SNN periodically
> pulls the fsimage file, which would have grown quite big by now. And the NN
> starts the cycle again. Suppose, you are ruuning completely out of luck and
> loose the entire NN. In such a case you can take his copy of fsimage from
> the SNN and retrieve your metadata back.
>
> HTH
>
> Best Regards,
> Tariq
> +91-9741563634
> https://mtariq.jux.com/
>
>
> On Thu, Dec 20, 2012 at 3:18 PM, 周梦想 <ab...@gmail.com> wrote:
>
>> Some reasons lead to my name node data error, but the error data also
>> overwrite the second name node data, also the NFS backup. I want to recover
>> the name node data a day ago or even a week ago,but I can't. I have to back
>> up name node data manually or write a bash script to backup it? why  hadoop
>> does not give a configure to   backup name node data to local disk daily or
>>  hourly with different time stamp name?
>>
>> The same question is to HBase's .META. and -ROOT- table. I think it's
>> history storage is more important 100  times than the log history.
>>
>> I think it could be implemented in Second Name Node/Check Points Node or
>> Back Node. Now I do this just using bash script.
>>
>> Some one agree with me?
>>
>>
>> Best Regards,
>> Andy Zhou
>>
>
>

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by Mohammad Tariq <do...@gmail.com>.
I am sorry Andy, I forgot one important point.

 The Secondary NameNode has been deprecated now, so consider using the
Checkpoint Node or Backup Node. Checkpoint Node is the process which is
actually responsible for creating periodic check points. It downloads
fsimage and log edits from the active NameNode, merges them locally, and
uploads the new image back to the active NameNode.

*It is advisable to run the Checkpoint Node on a different machine as it
consumes almost equal amount of memory as that of NameNode.

You can start Checkpoint Node by using "bin/hdfs namenode -checkpoint"
command.

The default value of the maximum delay between two consecutive checkpoints
is 1 hour (which is exactly what you want, right???). But you can configure
it as per your requirements through "dfs.namenode.checkpoint.period".

HTH

Best Regards,
Tariq
+91-9741563634
https://mtariq.jux.com/


On Thu, Dec 20, 2012 at 3:38 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello Andy,
>
>             NN stores all the metadata in a file called as "fsimage". The
> fsimage file contains a snapshot of the HDFS metadata. Along with fsimage
> NN also holds  "edit log" files. Whenever there is a change to HDFS, it
> gets appended to the edits file. When these log files grow big, they are
> merged together with fsimage file. These files are stored on the local FS
> at the path specified by the "dfs.name.dir" property in "hdfs-site.xml"
> file. To prevent any loss you can give multiple locations as the value for
> this property, say 1 on your local disk and another on a network drive in
> case you HD get crashed you still have the metadata safe with you in that
> network drive.(The condition which you have faced recently)
>
> Now, coming to the SNN. It is a helper node for the NN. SNN periodically
> pulls the fsimage file, which would have grown quite big by now. And the NN
> starts the cycle again. Suppose, you are ruuning completely out of luck and
> loose the entire NN. In such a case you can take his copy of fsimage from
> the SNN and retrieve your metadata back.
>
> HTH
>
> Best Regards,
> Tariq
> +91-9741563634
> https://mtariq.jux.com/
>
>
> On Thu, Dec 20, 2012 at 3:18 PM, 周梦想 <ab...@gmail.com> wrote:
>
>> Some reasons lead to my name node data error, but the error data also
>> overwrite the second name node data, also the NFS backup. I want to recover
>> the name node data a day ago or even a week ago,but I can't. I have to back
>> up name node data manually or write a bash script to backup it? why  hadoop
>> does not give a configure to   backup name node data to local disk daily or
>>  hourly with different time stamp name?
>>
>> The same question is to HBase's .META. and -ROOT- table. I think it's
>> history storage is more important 100  times than the log history.
>>
>> I think it could be implemented in Second Name Node/Check Points Node or
>> Back Node. Now I do this just using bash script.
>>
>> Some one agree with me?
>>
>>
>> Best Regards,
>> Andy Zhou
>>
>
>

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by Mohammad Tariq <do...@gmail.com>.
Hello Andy,

            NN stores all the metadata in a file called as "fsimage". The
fsimage file contains a snapshot of the HDFS metadata. Along with fsimage
NN also holds  "edit log" files. Whenever there is a change to HDFS, it
gets appended to the edits file. When these log files grow big, they are
merged together with fsimage file. These files are stored on the local FS
at the path specified by the "dfs.name.dir" property in "hdfs-site.xml"
file. To prevent any loss you can give multiple locations as the value for
this property, say 1 on your local disk and another on a network drive in
case you HD get crashed you still have the metadata safe with you in that
network drive.(The condition which you have faced recently)

Now, coming to the SNN. It is a helper node for the NN. SNN periodically
pulls the fsimage file, which would have grown quite big by now. And the NN
starts the cycle again. Suppose, you are ruuning completely out of luck and
loose the entire NN. In such a case you can take his copy of fsimage from
the SNN and retrieve your metadata back.

HTH

Best Regards,
Tariq
+91-9741563634
https://mtariq.jux.com/


On Thu, Dec 20, 2012 at 3:18 PM, 周梦想 <ab...@gmail.com> wrote:

> Some reasons lead to my name node data error, but the error data also
> overwrite the second name node data, also the NFS backup. I want to recover
> the name node data a day ago or even a week ago,but I can't. I have to back
> up name node data manually or write a bash script to backup it? why  hadoop
> does not give a configure to   backup name node data to local disk daily or
>  hourly with different time stamp name?
>
> The same question is to HBase's .META. and -ROOT- table. I think it's
> history storage is more important 100  times than the log history.
>
> I think it could be implemented in Second Name Node/Check Points Node or
> Back Node. Now I do this just using bash script.
>
> Some one agree with me?
>
>
> Best Regards,
> Andy Zhou
>

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by Mohammad Tariq <do...@gmail.com>.
Hello Andy,

            NN stores all the metadata in a file called as "fsimage". The
fsimage file contains a snapshot of the HDFS metadata. Along with fsimage
NN also holds  "edit log" files. Whenever there is a change to HDFS, it
gets appended to the edits file. When these log files grow big, they are
merged together with fsimage file. These files are stored on the local FS
at the path specified by the "dfs.name.dir" property in "hdfs-site.xml"
file. To prevent any loss you can give multiple locations as the value for
this property, say 1 on your local disk and another on a network drive in
case you HD get crashed you still have the metadata safe with you in that
network drive.(The condition which you have faced recently)

Now, coming to the SNN. It is a helper node for the NN. SNN periodically
pulls the fsimage file, which would have grown quite big by now. And the NN
starts the cycle again. Suppose, you are ruuning completely out of luck and
loose the entire NN. In such a case you can take his copy of fsimage from
the SNN and retrieve your metadata back.

HTH

Best Regards,
Tariq
+91-9741563634
https://mtariq.jux.com/


On Thu, Dec 20, 2012 at 3:18 PM, 周梦想 <ab...@gmail.com> wrote:

> Some reasons lead to my name node data error, but the error data also
> overwrite the second name node data, also the NFS backup. I want to recover
> the name node data a day ago or even a week ago,but I can't. I have to back
> up name node data manually or write a bash script to backup it? why  hadoop
> does not give a configure to   backup name node data to local disk daily or
>  hourly with different time stamp name?
>
> The same question is to HBase's .META. and -ROOT- table. I think it's
> history storage is more important 100  times than the log history.
>
> I think it could be implemented in Second Name Node/Check Points Node or
> Back Node. Now I do this just using bash script.
>
> Some one agree with me?
>
>
> Best Regards,
> Andy Zhou
>

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by Harsh J <ha...@cloudera.com>.
Hi,

On Thu, Dec 20, 2012 at 3:18 PM, 周梦想 <ab...@gmail.com> wrote:
> Some reasons lead to my name node data error, but the error data also
> overwrite the second name node data, also the NFS backup. I want to recover
> the name node data a day ago or even a week ago,but I can't.

The SecondaryNameNode does this, and that is also why it is
recommended to run. In HA HDFS, the StandbyNameNode does the same
action of checkpoints as SecondaryNameNode, to achieve the same
periodic goal.

This form of corruption at the SNN too should *never* occur normally,
and your SNN last-checkpoint-time should be actively monitored to not
grow too old (a sign of issues). Your version of Hadoop probably is
still affected by https://issues.apache.org/jira/browse/HDFS-3652 and
you should update to avoid loss due to it?

Also, if you ever suspect a local copy of NN to be bad, save its
namespace (hadoop dfsadmin -saveNamespace, requires NN be put in
safemode first) before you bring it down - this saves a copy from the
memory onto the disk.

> I have to back
> up name node data manually or write a bash script to backup it? why  hadoop
> does not give a configure to   backup name node data to local disk daily or
> hourly with different time stamp name?

If the NN's disk itself is corrupt, backing it up would be no good
either, so this solution vs. SNN still doesn't solve anything of your
original issue.

> The same question is to HBase's .META. and -ROOT- table. I think it's
> history storage is more important 100  times than the log history.

The HBase .META. and -ROOT- are already on HDFS, so are pretty
reliable (with HBase's WAL and 3x replication of blocks).

> I think it could be implemented in Second Name Node/Check Points Node or
> Back Node. Now I do this just using bash script.

I don't think using a bash script to backup the metadata is a better
solution than relying on the SecondaryNameNode. Two reasons: It does
the same form of a copy-backup (no validation like SNN does), and it
does not checkpoint (i.e. merge the edits into the fsimage).

--
Harsh J

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by Mohammad Tariq <do...@gmail.com>.
Hello Andy,

            NN stores all the metadata in a file called as "fsimage". The
fsimage file contains a snapshot of the HDFS metadata. Along with fsimage
NN also holds  "edit log" files. Whenever there is a change to HDFS, it
gets appended to the edits file. When these log files grow big, they are
merged together with fsimage file. These files are stored on the local FS
at the path specified by the "dfs.name.dir" property in "hdfs-site.xml"
file. To prevent any loss you can give multiple locations as the value for
this property, say 1 on your local disk and another on a network drive in
case you HD get crashed you still have the metadata safe with you in that
network drive.(The condition which you have faced recently)

Now, coming to the SNN. It is a helper node for the NN. SNN periodically
pulls the fsimage file, which would have grown quite big by now. And the NN
starts the cycle again. Suppose, you are ruuning completely out of luck and
loose the entire NN. In such a case you can take his copy of fsimage from
the SNN and retrieve your metadata back.

HTH

Best Regards,
Tariq
+91-9741563634
https://mtariq.jux.com/


On Thu, Dec 20, 2012 at 3:18 PM, 周梦想 <ab...@gmail.com> wrote:

> Some reasons lead to my name node data error, but the error data also
> overwrite the second name node data, also the NFS backup. I want to recover
> the name node data a day ago or even a week ago,but I can't. I have to back
> up name node data manually or write a bash script to backup it? why  hadoop
> does not give a configure to   backup name node data to local disk daily or
>  hourly with different time stamp name?
>
> The same question is to HBase's .META. and -ROOT- table. I think it's
> history storage is more important 100  times than the log history.
>
> I think it could be implemented in Second Name Node/Check Points Node or
> Back Node. Now I do this just using bash script.
>
> Some one agree with me?
>
>
> Best Regards,
> Andy Zhou
>

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by Mohammad Tariq <do...@gmail.com>.
Hello Andy,

            NN stores all the metadata in a file called as "fsimage". The
fsimage file contains a snapshot of the HDFS metadata. Along with fsimage
NN also holds  "edit log" files. Whenever there is a change to HDFS, it
gets appended to the edits file. When these log files grow big, they are
merged together with fsimage file. These files are stored on the local FS
at the path specified by the "dfs.name.dir" property in "hdfs-site.xml"
file. To prevent any loss you can give multiple locations as the value for
this property, say 1 on your local disk and another on a network drive in
case you HD get crashed you still have the metadata safe with you in that
network drive.(The condition which you have faced recently)

Now, coming to the SNN. It is a helper node for the NN. SNN periodically
pulls the fsimage file, which would have grown quite big by now. And the NN
starts the cycle again. Suppose, you are ruuning completely out of luck and
loose the entire NN. In such a case you can take his copy of fsimage from
the SNN and retrieve your metadata back.

HTH

Best Regards,
Tariq
+91-9741563634
https://mtariq.jux.com/


On Thu, Dec 20, 2012 at 3:18 PM, 周梦想 <ab...@gmail.com> wrote:

> Some reasons lead to my name node data error, but the error data also
> overwrite the second name node data, also the NFS backup. I want to recover
> the name node data a day ago or even a week ago,but I can't. I have to back
> up name node data manually or write a bash script to backup it? why  hadoop
> does not give a configure to   backup name node data to local disk daily or
>  hourly with different time stamp name?
>
> The same question is to HBase's .META. and -ROOT- table. I think it's
> history storage is more important 100  times than the log history.
>
> I think it could be implemented in Second Name Node/Check Points Node or
> Back Node. Now I do this just using bash script.
>
> Some one agree with me?
>
>
> Best Regards,
> Andy Zhou
>

Re: why not hadoop backup name node data to local disk daily or hourly?

Posted by Harsh J <ha...@cloudera.com>.
Hi,

On Thu, Dec 20, 2012 at 3:18 PM, 周梦想 <ab...@gmail.com> wrote:
> Some reasons lead to my name node data error, but the error data also
> overwrite the second name node data, also the NFS backup. I want to recover
> the name node data a day ago or even a week ago,but I can't.

The SecondaryNameNode does this, and that is also why it is
recommended to run. In HA HDFS, the StandbyNameNode does the same
action of checkpoints as SecondaryNameNode, to achieve the same
periodic goal.

This form of corruption at the SNN too should *never* occur normally,
and your SNN last-checkpoint-time should be actively monitored to not
grow too old (a sign of issues). Your version of Hadoop probably is
still affected by https://issues.apache.org/jira/browse/HDFS-3652 and
you should update to avoid loss due to it?

Also, if you ever suspect a local copy of NN to be bad, save its
namespace (hadoop dfsadmin -saveNamespace, requires NN be put in
safemode first) before you bring it down - this saves a copy from the
memory onto the disk.

> I have to back
> up name node data manually or write a bash script to backup it? why  hadoop
> does not give a configure to   backup name node data to local disk daily or
> hourly with different time stamp name?

If the NN's disk itself is corrupt, backing it up would be no good
either, so this solution vs. SNN still doesn't solve anything of your
original issue.

> The same question is to HBase's .META. and -ROOT- table. I think it's
> history storage is more important 100  times than the log history.

The HBase .META. and -ROOT- are already on HDFS, so are pretty
reliable (with HBase's WAL and 3x replication of blocks).

> I think it could be implemented in Second Name Node/Check Points Node or
> Back Node. Now I do this just using bash script.

I don't think using a bash script to backup the metadata is a better
solution than relying on the SecondaryNameNode. Two reasons: It does
the same form of a copy-backup (no validation like SNN does), and it
does not checkpoint (i.e. merge the edits into the fsimage).

--
Harsh J