You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Rahul Bhattacharjee <ra...@gmail.com> on 2013/04/03 16:40:28 UTC

NameNode failure and recovery!

Hi all,

I was reading about Hadoop and got to know that there are two ways to
protect against the name node failures.

1) To write to a nfs mount along with the usual local disk.
 -or-
2) Use secondary name node. In case of failure of NN , the SNN can take in
charge.

My questions :-

1) SNN is always lagging , so when SNN becomes primary in event of a NN
failure ,  then the edits which have not been merged into the image file
would be lost , so the system of SNN would not be consistent with the NN
before its failure.

2) Also I have read that other purpose of SNN is to periodically merge the
edit logs with the image file. In case a setup goes with option #1 (writing
to NFS, no SNN) , then who does this merging.

Thanks,
Rahul

Re: NameNode failure and recovery!

Posted by Mohammad Tariq <do...@gmail.com>.

@Vijay : We seem to be in 100% sync though :)

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Wed, Apr 3, 2013 at 8:27 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello Rahul,
>
>       It's always better to have both 1 and 2 together. One common
> misconception is that SNN is a backup of the NN, which is wrong. SNN is a
> helper node to the NN. In case of any failure SNN is not gonna take up the
> NN spot.
>
> Yes, we can't guarantee that the SNN fsimage replica will always be up to
> date. And when you are writing the metadata on a filer or NFS, you are just
> creating an additional copy of the metadata. Don't mistake it with SNN.
> When you specify value of your "dfs.name.dir" property as a comma separated
> list, which is localFS+NFS, you are just making sure that even if something
> goes wrong with the localFS, your metadata is still same in the NFS.
>
> But, it is still better to have the SNN in a separate machine. But you can
> never rely 100% on SNN, because of the fact you have already mentioned.
> It'll not be in 100% sync.
>
>
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Wed, Apr 3, 2013 at 8:12 PM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> Or both the options are used together. NFS + SNN ?
>>
>>
>>
>>  On Wed, Apr 3, 2013 at 8:10 PM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I was reading about Hadoop and got to know that there are two ways to
>>> protect against the name node failures.
>>>
>>> 1) To write to a nfs mount along with the usual local disk.
>>>  -or-
>>> 2) Use secondary name node. In case of failure of NN , the SNN can take
>>> in charge.
>>>
>>> My questions :-
>>>
>>> 1) SNN is always lagging , so when SNN becomes primary in event of a NN
>>> failure ,  then the edits which have not been merged into the image file
>>> would be lost , so the system of SNN would not be consistent with the NN
>>> before its failure.
>>>
>>> 2) Also I have read that other purpose of SNN is to periodically merge
>>> the edit logs with the image file. In case a setup goes with option #1
>>> (writing to NFS, no SNN) , then who does this merging.
>>>
>>> Thanks,
>>> Rahul
>>>
>>>
>>>
>>
>

Re: NameNode failure and recovery!

Posted by Mohammad Tariq <do...@gmail.com>.

@Vijay : We seem to be in 100% sync though :)

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Wed, Apr 3, 2013 at 8:27 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello Rahul,
>
>       It's always better to have both 1 and 2 together. One common
> misconception is that SNN is a backup of the NN, which is wrong. SNN is a
> helper node to the NN. In case of any failure SNN is not gonna take up the
> NN spot.
>
> Yes, we can't guarantee that the SNN fsimage replica will always be up to
> date. And when you are writing the metadata on a filer or NFS, you are just
> creating an additional copy of the metadata. Don't mistake it with SNN.
> When you specify value of your "dfs.name.dir" property as a comma separated
> list, which is localFS+NFS, you are just making sure that even if something
> goes wrong with the localFS, your metadata is still same in the NFS.
>
> But, it is still better to have the SNN in a separate machine. But you can
> never rely 100% on SNN, because of the fact you have already mentioned.
> It'll not be in 100% sync.
>
>
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Wed, Apr 3, 2013 at 8:12 PM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> Or both the options are used together. NFS + SNN ?
>>
>>
>>
>>  On Wed, Apr 3, 2013 at 8:10 PM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I was reading about Hadoop and got to know that there are two ways to
>>> protect against the name node failures.
>>>
>>> 1) To write to a nfs mount along with the usual local disk.
>>>  -or-
>>> 2) Use secondary name node. In case of failure of NN , the SNN can take
>>> in charge.
>>>
>>> My questions :-
>>>
>>> 1) SNN is always lagging , so when SNN becomes primary in event of a NN
>>> failure ,  then the edits which have not been merged into the image file
>>> would be lost , so the system of SNN would not be consistent with the NN
>>> before its failure.
>>>
>>> 2) Also I have read that other purpose of SNN is to periodically merge
>>> the edit logs with the image file. In case a setup goes with option #1
>>> (writing to NFS, no SNN) , then who does this merging.
>>>
>>> Thanks,
>>> Rahul
>>>
>>>
>>>
>>
>

Re: NameNode failure and recovery!

Posted by Mohammad Tariq <do...@gmail.com>.

@Vijay : We seem to be in 100% sync though :)

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Wed, Apr 3, 2013 at 8:27 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello Rahul,
>
>       It's always better to have both 1 and 2 together. One common
> misconception is that SNN is a backup of the NN, which is wrong. SNN is a
> helper node to the NN. In case of any failure SNN is not gonna take up the
> NN spot.
>
> Yes, we can't guarantee that the SNN fsimage replica will always be up to
> date. And when you are writing the metadata on a filer or NFS, you are just
> creating an additional copy of the metadata. Don't mistake it with SNN.
> When you specify value of your "dfs.name.dir" property as a comma separated
> list, which is localFS+NFS, you are just making sure that even if something
> goes wrong with the localFS, your metadata is still same in the NFS.
>
> But, it is still better to have the SNN in a separate machine. But you can
> never rely 100% on SNN, because of the fact you have already mentioned.
> It'll not be in 100% sync.
>
>
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Wed, Apr 3, 2013 at 8:12 PM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> Or both the options are used together. NFS + SNN ?
>>
>>
>>
>>  On Wed, Apr 3, 2013 at 8:10 PM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I was reading about Hadoop and got to know that there are two ways to
>>> protect against the name node failures.
>>>
>>> 1) To write to a nfs mount along with the usual local disk.
>>>  -or-
>>> 2) Use secondary name node. In case of failure of NN , the SNN can take
>>> in charge.
>>>
>>> My questions :-
>>>
>>> 1) SNN is always lagging , so when SNN becomes primary in event of a NN
>>> failure ,  then the edits which have not been merged into the image file
>>> would be lost , so the system of SNN would not be consistent with the NN
>>> before its failure.
>>>
>>> 2) Also I have read that other purpose of SNN is to periodically merge
>>> the edit logs with the image file. In case a setup goes with option #1
>>> (writing to NFS, no SNN) , then who does this merging.
>>>
>>> Thanks,
>>> Rahul
>>>
>>>
>>>
>>
>

Re: NameNode failure and recovery!

Posted by Mohammad Tariq <do...@gmail.com>.

@Vijay : We seem to be in 100% sync though :)

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Wed, Apr 3, 2013 at 8:27 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello Rahul,
>
>       It's always better to have both 1 and 2 together. One common
> misconception is that SNN is a backup of the NN, which is wrong. SNN is a
> helper node to the NN. In case of any failure SNN is not gonna take up the
> NN spot.
>
> Yes, we can't guarantee that the SNN fsimage replica will always be up to
> date. And when you are writing the metadata on a filer or NFS, you are just
> creating an additional copy of the metadata. Don't mistake it with SNN.
> When you specify value of your "dfs.name.dir" property as a comma separated
> list, which is localFS+NFS, you are just making sure that even if something
> goes wrong with the localFS, your metadata is still same in the NFS.
>
> But, it is still better to have the SNN in a separate machine. But you can
> never rely 100% on SNN, because of the fact you have already mentioned.
> It'll not be in 100% sync.
>
>
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Wed, Apr 3, 2013 at 8:12 PM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> Or both the options are used together. NFS + SNN ?
>>
>>
>>
>>  On Wed, Apr 3, 2013 at 8:10 PM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I was reading about Hadoop and got to know that there are two ways to
>>> protect against the name node failures.
>>>
>>> 1) To write to a nfs mount along with the usual local disk.
>>>  -or-
>>> 2) Use secondary name node. In case of failure of NN , the SNN can take
>>> in charge.
>>>
>>> My questions :-
>>>
>>> 1) SNN is always lagging , so when SNN becomes primary in event of a NN
>>> failure ,  then the edits which have not been merged into the image file
>>> would be lost , so the system of SNN would not be consistent with the NN
>>> before its failure.
>>>
>>> 2) Also I have read that other purpose of SNN is to periodically merge
>>> the edit logs with the image file. In case a setup goes with option #1
>>> (writing to NFS, no SNN) , then who does this merging.
>>>
>>> Thanks,
>>> Rahul
>>>
>>>
>>>
>>
>

Re: NameNode failure and recovery!

Posted by Mohammad Tariq <do...@gmail.com>.

Hello Rahul,

      It's always better to have both 1 and 2 together. One common
misconception is that SNN is a backup of the NN, which is wrong. SNN is a
helper node to the NN. In case of any failure SNN is not gonna take up the
NN spot.

Yes, we can't guarantee that the SNN fsimage replica will always be up to
date. And when you are writing the metadata on a filer or NFS, you are just
creating an additional copy of the metadata. Don't mistake it with SNN.
When you specify value of your "dfs.name.dir" property as a comma separated
list, which is localFS+NFS, you are just making sure that even if something
goes wrong with the localFS, your metadata is still same in the NFS.

But, it is still better to have the SNN in a separate machine. But you can
never rely 100% on SNN, because of the fact you have already mentioned.
It'll not be in 100% sync.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com

On Wed, Apr 3, 2013 at 8:12 PM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

> Or both the options are used together. NFS + SNN ?
>
>
>
>  On Wed, Apr 3, 2013 at 8:10 PM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> Hi all,
>>
>> I was reading about Hadoop and got to know that there are two ways to
>> protect against the name node failures.
>>
>> 1) To write to a nfs mount along with the usual local disk.
>>  -or-
>> 2) Use secondary name node. In case of failure of NN , the SNN can take
>> in charge.
>>
>> My questions :-
>>
>> 1) SNN is always lagging , so when SNN becomes primary in event of a NN
>> failure ,  then the edits which have not been merged into the image file
>> would be lost , so the system of SNN would not be consistent with the NN
>> before its failure.
>>
>> 2) Also I have read that other purpose of SNN is to periodically merge
>> the edit logs with the image file. In case a setup goes with option #1
>> (writing to NFS, no SNN) , then who does this merging.
>>
>> Thanks,
>> Rahul
>>
>>
>>
>

Re: NameNode failure and recovery!

Posted by Mohammad Tariq <do...@gmail.com>.

Hello Rahul,

      It's always better to have both 1 and 2 together. One common
misconception is that SNN is a backup of the NN, which is wrong. SNN is a
helper node to the NN. In case of any failure SNN is not gonna take up the
NN spot.

Yes, we can't guarantee that the SNN fsimage replica will always be up to
date. And when you are writing the metadata on a filer or NFS, you are just
creating an additional copy of the metadata. Don't mistake it with SNN.
When you specify value of your "dfs.name.dir" property as a comma separated
list, which is localFS+NFS, you are just making sure that even if something
goes wrong with the localFS, your metadata is still same in the NFS.

But, it is still better to have the SNN in a separate machine. But you can
never rely 100% on SNN, because of the fact you have already mentioned.
It'll not be in 100% sync.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com

On Wed, Apr 3, 2013 at 8:12 PM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

> Or both the options are used together. NFS + SNN ?
>
>
>
>  On Wed, Apr 3, 2013 at 8:10 PM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> Hi all,
>>
>> I was reading about Hadoop and got to know that there are two ways to
>> protect against the name node failures.
>>
>> 1) To write to a nfs mount along with the usual local disk.
>>  -or-
>> 2) Use secondary name node. In case of failure of NN , the SNN can take
>> in charge.
>>
>> My questions :-
>>
>> 1) SNN is always lagging , so when SNN becomes primary in event of a NN
>> failure ,  then the edits which have not been merged into the image file
>> would be lost , so the system of SNN would not be consistent with the NN
>> before its failure.
>>
>> 2) Also I have read that other purpose of SNN is to periodically merge
>> the edit logs with the image file. In case a setup goes with option #1
>> (writing to NFS, no SNN) , then who does this merging.
>>
>> Thanks,
>> Rahul
>>
>>
>>
>

Re: NameNode failure and recovery!

Posted by Mohammad Tariq <do...@gmail.com>.

Hello Rahul,

      It's always better to have both 1 and 2 together. One common
misconception is that SNN is a backup of the NN, which is wrong. SNN is a
helper node to the NN. In case of any failure SNN is not gonna take up the
NN spot.

Yes, we can't guarantee that the SNN fsimage replica will always be up to
date. And when you are writing the metadata on a filer or NFS, you are just
creating an additional copy of the metadata. Don't mistake it with SNN.
When you specify value of your "dfs.name.dir" property as a comma separated
list, which is localFS+NFS, you are just making sure that even if something
goes wrong with the localFS, your metadata is still same in the NFS.

But, it is still better to have the SNN in a separate machine. But you can
never rely 100% on SNN, because of the fact you have already mentioned.
It'll not be in 100% sync.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com

On Wed, Apr 3, 2013 at 8:12 PM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

> Or both the options are used together. NFS + SNN ?
>
>
>
>  On Wed, Apr 3, 2013 at 8:10 PM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> Hi all,
>>
>> I was reading about Hadoop and got to know that there are two ways to
>> protect against the name node failures.
>>
>> 1) To write to a nfs mount along with the usual local disk.
>>  -or-
>> 2) Use secondary name node. In case of failure of NN , the SNN can take
>> in charge.
>>
>> My questions :-
>>
>> 1) SNN is always lagging , so when SNN becomes primary in event of a NN
>> failure ,  then the edits which have not been merged into the image file
>> would be lost , so the system of SNN would not be consistent with the NN
>> before its failure.
>>
>> 2) Also I have read that other purpose of SNN is to periodically merge
>> the edit logs with the image file. In case a setup goes with option #1
>> (writing to NFS, no SNN) , then who does this merging.
>>
>> Thanks,
>> Rahul
>>
>>
>>
>

Re: NameNode failure and recovery!

Posted by Mohammad Tariq <do...@gmail.com>.

Hello Rahul,

      It's always better to have both 1 and 2 together. One common
misconception is that SNN is a backup of the NN, which is wrong. SNN is a
helper node to the NN. In case of any failure SNN is not gonna take up the
NN spot.

Yes, we can't guarantee that the SNN fsimage replica will always be up to
date. And when you are writing the metadata on a filer or NFS, you are just
creating an additional copy of the metadata. Don't mistake it with SNN.
When you specify value of your "dfs.name.dir" property as a comma separated
list, which is localFS+NFS, you are just making sure that even if something
goes wrong with the localFS, your metadata is still same in the NFS.

But, it is still better to have the SNN in a separate machine. But you can
never rely 100% on SNN, because of the fact you have already mentioned.
It'll not be in 100% sync.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com

On Wed, Apr 3, 2013 at 8:12 PM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

> Or both the options are used together. NFS + SNN ?
>
>
>
>  On Wed, Apr 3, 2013 at 8:10 PM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> Hi all,
>>
>> I was reading about Hadoop and got to know that there are two ways to
>> protect against the name node failures.
>>
>> 1) To write to a nfs mount along with the usual local disk.
>>  -or-
>> 2) Use secondary name node. In case of failure of NN , the SNN can take
>> in charge.
>>
>> My questions :-
>>
>> 1) SNN is always lagging , so when SNN becomes primary in event of a NN
>> failure ,  then the edits which have not been merged into the image file
>> would be lost , so the system of SNN would not be consistent with the NN
>> before its failure.
>>
>> 2) Also I have read that other purpose of SNN is to periodically merge
>> the edit logs with the image file. In case a setup goes with option #1
>> (writing to NFS, no SNN) , then who does this merging.
>>
>> Thanks,
>> Rahul
>>
>>
>>
>

Re: NameNode failure and recovery!

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Or both the options are used together. NFS + SNN ?



On Wed, Apr 3, 2013 at 8:10 PM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

> Hi all,
>
> I was reading about Hadoop and got to know that there are two ways to
> protect against the name node failures.
>
> 1) To write to a nfs mount along with the usual local disk.
>  -or-
> 2) Use secondary name node. In case of failure of NN , the SNN can take in
> charge.
>
> My questions :-
>
> 1) SNN is always lagging , so when SNN becomes primary in event of a NN
> failure ,  then the edits which have not been merged into the image file
> would be lost , so the system of SNN would not be consistent with the NN
> before its failure.
>
> 2) Also I have read that other purpose of SNN is to periodically merge the
> edit logs with the image file. In case a setup goes with option #1 (writing
> to NFS, no SNN) , then who does this merging.
>
> Thanks,
> Rahul
>
>
>

RE: NameNode failure and recovery!

Posted by Vijay Thakorlal <vi...@hotmail.com>.

Hi Rahul,

 

The SNN does not act as a backup / standby NameNode in the event of failure. 

 

The sole purpose of the Secondary NameNode (or as it’s otherwise / more correctly known as the Checkpoint Node) is to perform checkpointing of the current state of HDFS:

 

The SNN retrieves the fsimage and edits files from the NN 

The NN rolls the edits file

The SNN Loads the fsimage into memory 

Then the SNN replays the edits log file to merge the two

Then the SNN transfers the merged checkpoint back to the NN

The NN uses the checkpoint as the new fsimage file

 

It’s true that technically you could use the fsimage from the SNN if completely lost the NN – and yes as you said you would “lose” any changes to HDFS that occurred between the NN dieing and the last time the checkpoint occurred. But as mentioned the SNN is not a backup for the NN.

 

Regards,

Vijay

 

From: Rahul Bhattacharjee [mailto:rahul.rec.dgp@gmail.com] 
Sent: 03 April 2013 15:40
To: user@hadoop.apache.org
Subject: NameNode failure and recovery!

 

Hi all,

I was reading about Hadoop and got to know that there are two ways to protect against the name node failures.

1) To write to a nfs mount along with the usual local disk.

 -or-

2) Use secondary name node. In case of failure of NN , the SNN can take in charge. 

My questions :-

1) SNN is always lagging , so when SNN becomes primary in event of a NN failure ,  then the edits which have not been merged into the image file would be lost , so the system of SNN would not be consistent with the NN before its failure.

2) Also I have read that other purpose of SNN is to periodically merge the edit logs with the image file. In case a setup goes with option #1 (writing to NFS, no SNN) , then who does this merging.

 

Thanks,
Rahul

Re: NameNode failure and recovery!

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Thats also doable , reducing the checkpoint period would also have have
some amount of edit log loss and how short should be the checkpoint
interval has to be evaluated.I think the good way to go , in case HA is not
doable is SNN and secondary storage NFS.

Thanks,
Rahul


On Thu, Apr 4, 2013 at 12:19 AM, shashwat shriparv <
dwivedishashwat@gmail.com> wrote:

> If you are not in position to go for HA just keep your checkpoint period
> shorter to have recent data recoverable from SNN.
>
> and you always have a option
> hadoop namenode -recover
> try this on testing cluster and get versed to it.
>
> and take backup of image at some solid state storage.
>
>
>
> ∞
> Shashwat Shriparv
>
>
>
> On Wed, Apr 3, 2013 at 9:56 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> There is a 3rd, most excellent way: Use HDFS's own HA, see
>>
>> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
>> :)
>>
>> On Wed, Apr 3, 2013 at 8:10 PM, Rahul Bhattacharjee
>> <ra...@gmail.com> wrote:
>> > Hi all,
>> >
>> > I was reading about Hadoop and got to know that there are two ways to
>> > protect against the name node failures.
>> >
>> > 1) To write to a nfs mount along with the usual local disk.
>> >  -or-
>> > 2) Use secondary name node. In case of failure of NN , the SNN can take
>> in
>> > charge.
>> >
>> > My questions :-
>> >
>> > 1) SNN is always lagging , so when SNN becomes primary in event of a NN
>> > failure ,  then the edits which have not been merged into the image file
>> > would be lost , so the system of SNN would not be consistent with the NN
>> > before its failure.
>> >
>> > 2) Also I have read that other purpose of SNN is to periodically merge
>> the
>> > edit logs with the image file. In case a setup goes with option #1
>> (writing
>> > to NFS, no SNN) , then who does this merging.
>> >
>> > Thanks,
>> > Rahul
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: NameNode failure and recovery!

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Thats also doable , reducing the checkpoint period would also have have
some amount of edit log loss and how short should be the checkpoint
interval has to be evaluated.I think the good way to go , in case HA is not
doable is SNN and secondary storage NFS.

Thanks,
Rahul


On Thu, Apr 4, 2013 at 12:19 AM, shashwat shriparv <
dwivedishashwat@gmail.com> wrote:

> If you are not in position to go for HA just keep your checkpoint period
> shorter to have recent data recoverable from SNN.
>
> and you always have a option
> hadoop namenode -recover
> try this on testing cluster and get versed to it.
>
> and take backup of image at some solid state storage.
>
>
>
> ∞
> Shashwat Shriparv
>
>
>
> On Wed, Apr 3, 2013 at 9:56 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> There is a 3rd, most excellent way: Use HDFS's own HA, see
>>
>> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
>> :)
>>
>> On Wed, Apr 3, 2013 at 8:10 PM, Rahul Bhattacharjee
>> <ra...@gmail.com> wrote:
>> > Hi all,
>> >
>> > I was reading about Hadoop and got to know that there are two ways to
>> > protect against the name node failures.
>> >
>> > 1) To write to a nfs mount along with the usual local disk.
>> >  -or-
>> > 2) Use secondary name node. In case of failure of NN , the SNN can take
>> in
>> > charge.
>> >
>> > My questions :-
>> >
>> > 1) SNN is always lagging , so when SNN becomes primary in event of a NN
>> > failure ,  then the edits which have not been merged into the image file
>> > would be lost , so the system of SNN would not be consistent with the NN
>> > before its failure.
>> >
>> > 2) Also I have read that other purpose of SNN is to periodically merge
>> the
>> > edit logs with the image file. In case a setup goes with option #1
>> (writing
>> > to NFS, no SNN) , then who does this merging.
>> >
>> > Thanks,
>> > Rahul
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: NameNode failure and recovery!

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Thats also doable , reducing the checkpoint period would also have have
some amount of edit log loss and how short should be the checkpoint
interval has to be evaluated.I think the good way to go , in case HA is not
doable is SNN and secondary storage NFS.

Thanks,
Rahul


On Thu, Apr 4, 2013 at 12:19 AM, shashwat shriparv <
dwivedishashwat@gmail.com> wrote:

> If you are not in position to go for HA just keep your checkpoint period
> shorter to have recent data recoverable from SNN.
>
> and you always have a option
> hadoop namenode -recover
> try this on testing cluster and get versed to it.
>
> and take backup of image at some solid state storage.
>
>
>
> ∞
> Shashwat Shriparv
>
>
>
> On Wed, Apr 3, 2013 at 9:56 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> There is a 3rd, most excellent way: Use HDFS's own HA, see
>>
>> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
>> :)
>>
>> On Wed, Apr 3, 2013 at 8:10 PM, Rahul Bhattacharjee
>> <ra...@gmail.com> wrote:
>> > Hi all,
>> >
>> > I was reading about Hadoop and got to know that there are two ways to
>> > protect against the name node failures.
>> >
>> > 1) To write to a nfs mount along with the usual local disk.
>> >  -or-
>> > 2) Use secondary name node. In case of failure of NN , the SNN can take
>> in
>> > charge.
>> >
>> > My questions :-
>> >
>> > 1) SNN is always lagging , so when SNN becomes primary in event of a NN
>> > failure ,  then the edits which have not been merged into the image file
>> > would be lost , so the system of SNN would not be consistent with the NN
>> > before its failure.
>> >
>> > 2) Also I have read that other purpose of SNN is to periodically merge
>> the
>> > edit logs with the image file. In case a setup goes with option #1
>> (writing
>> > to NFS, no SNN) , then who does this merging.
>> >
>> > Thanks,
>> > Rahul
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: NameNode failure and recovery!

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Thats also doable , reducing the checkpoint period would also have have
some amount of edit log loss and how short should be the checkpoint
interval has to be evaluated.I think the good way to go , in case HA is not
doable is SNN and secondary storage NFS.

Thanks,
Rahul


On Thu, Apr 4, 2013 at 12:19 AM, shashwat shriparv <
dwivedishashwat@gmail.com> wrote:

> If you are not in position to go for HA just keep your checkpoint period
> shorter to have recent data recoverable from SNN.
>
> and you always have a option
> hadoop namenode -recover
> try this on testing cluster and get versed to it.
>
> and take backup of image at some solid state storage.
>
>
>
> ∞
> Shashwat Shriparv
>
>
>
> On Wed, Apr 3, 2013 at 9:56 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> There is a 3rd, most excellent way: Use HDFS's own HA, see
>>
>> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
>> :)
>>
>> On Wed, Apr 3, 2013 at 8:10 PM, Rahul Bhattacharjee
>> <ra...@gmail.com> wrote:
>> > Hi all,
>> >
>> > I was reading about Hadoop and got to know that there are two ways to
>> > protect against the name node failures.
>> >
>> > 1) To write to a nfs mount along with the usual local disk.
>> >  -or-
>> > 2) Use secondary name node. In case of failure of NN , the SNN can take
>> in
>> > charge.
>> >
>> > My questions :-
>> >
>> > 1) SNN is always lagging , so when SNN becomes primary in event of a NN
>> > failure ,  then the edits which have not been merged into the image file
>> > would be lost , so the system of SNN would not be consistent with the NN
>> > before its failure.
>> >
>> > 2) Also I have read that other purpose of SNN is to periodically merge
>> the
>> > edit logs with the image file. In case a setup goes with option #1
>> (writing
>> > to NFS, no SNN) , then who does this merging.
>> >
>> > Thanks,
>> > Rahul
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: NameNode failure and recovery!

Posted by shashwat shriparv <dw...@gmail.com>.

If you are not in position to go for HA just keep your checkpoint period
shorter to have recent data recoverable from SNN.

and you always have a option
hadoop namenode -recover
try this on testing cluster and get versed to it.

and take backup of image at some solid state storage.



∞
Shashwat Shriparv



On Wed, Apr 3, 2013 at 9:56 PM, Harsh J <ha...@cloudera.com> wrote:

> There is a 3rd, most excellent way: Use HDFS's own HA, see
>
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
> :)
>
> On Wed, Apr 3, 2013 at 8:10 PM, Rahul Bhattacharjee
> <ra...@gmail.com> wrote:
> > Hi all,
> >
> > I was reading about Hadoop and got to know that there are two ways to
> > protect against the name node failures.
> >
> > 1) To write to a nfs mount along with the usual local disk.
> >  -or-
> > 2) Use secondary name node. In case of failure of NN , the SNN can take
> in
> > charge.
> >
> > My questions :-
> >
> > 1) SNN is always lagging , so when SNN becomes primary in event of a NN
> > failure ,  then the edits which have not been merged into the image file
> > would be lost , so the system of SNN would not be consistent with the NN
> > before its failure.
> >
> > 2) Also I have read that other purpose of SNN is to periodically merge
> the
> > edit logs with the image file. In case a setup goes with option #1
> (writing
> > to NFS, no SNN) , then who does this merging.
> >
> > Thanks,
> > Rahul
> >
> >
>
>
>
> --
> Harsh J
>

Re: NameNode failure and recovery!

Posted by shashwat shriparv <dw...@gmail.com>.

If you are not in position to go for HA just keep your checkpoint period
shorter to have recent data recoverable from SNN.

and you always have a option
hadoop namenode -recover
try this on testing cluster and get versed to it.

and take backup of image at some solid state storage.



∞
Shashwat Shriparv



On Wed, Apr 3, 2013 at 9:56 PM, Harsh J <ha...@cloudera.com> wrote:

> There is a 3rd, most excellent way: Use HDFS's own HA, see
>
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
> :)
>
> On Wed, Apr 3, 2013 at 8:10 PM, Rahul Bhattacharjee
> <ra...@gmail.com> wrote:
> > Hi all,
> >
> > I was reading about Hadoop and got to know that there are two ways to
> > protect against the name node failures.
> >
> > 1) To write to a nfs mount along with the usual local disk.
> >  -or-
> > 2) Use secondary name node. In case of failure of NN , the SNN can take
> in
> > charge.
> >
> > My questions :-
> >
> > 1) SNN is always lagging , so when SNN becomes primary in event of a NN
> > failure ,  then the edits which have not been merged into the image file
> > would be lost , so the system of SNN would not be consistent with the NN
> > before its failure.
> >
> > 2) Also I have read that other purpose of SNN is to periodically merge
> the
> > edit logs with the image file. In case a setup goes with option #1
> (writing
> > to NFS, no SNN) , then who does this merging.
> >
> > Thanks,
> > Rahul
> >
> >
>
>
>
> --
> Harsh J
>

Re: NameNode failure and recovery!

Posted by shashwat shriparv <dw...@gmail.com>.

If you are not in position to go for HA just keep your checkpoint period
shorter to have recent data recoverable from SNN.

and you always have a option
hadoop namenode -recover
try this on testing cluster and get versed to it.

and take backup of image at some solid state storage.



∞
Shashwat Shriparv



On Wed, Apr 3, 2013 at 9:56 PM, Harsh J <ha...@cloudera.com> wrote:

> There is a 3rd, most excellent way: Use HDFS's own HA, see
>
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
> :)
>
> On Wed, Apr 3, 2013 at 8:10 PM, Rahul Bhattacharjee
> <ra...@gmail.com> wrote:
> > Hi all,
> >
> > I was reading about Hadoop and got to know that there are two ways to
> > protect against the name node failures.
> >
> > 1) To write to a nfs mount along with the usual local disk.
> >  -or-
> > 2) Use secondary name node. In case of failure of NN , the SNN can take
> in
> > charge.
> >
> > My questions :-
> >
> > 1) SNN is always lagging , so when SNN becomes primary in event of a NN
> > failure ,  then the edits which have not been merged into the image file
> > would be lost , so the system of SNN would not be consistent with the NN
> > before its failure.
> >
> > 2) Also I have read that other purpose of SNN is to periodically merge
> the
> > edit logs with the image file. In case a setup goes with option #1
> (writing
> > to NFS, no SNN) , then who does this merging.
> >
> > Thanks,
> > Rahul
> >
> >
>
>
>
> --
> Harsh J
>

Re: NameNode failure and recovery!

Posted by shashwat shriparv <dw...@gmail.com>.

If you are not in position to go for HA just keep your checkpoint period
shorter to have recent data recoverable from SNN.

and you always have a option
hadoop namenode -recover
try this on testing cluster and get versed to it.

and take backup of image at some solid state storage.



∞
Shashwat Shriparv



On Wed, Apr 3, 2013 at 9:56 PM, Harsh J <ha...@cloudera.com> wrote:

> There is a 3rd, most excellent way: Use HDFS's own HA, see
>
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
> :)
>
> On Wed, Apr 3, 2013 at 8:10 PM, Rahul Bhattacharjee
> <ra...@gmail.com> wrote:
> > Hi all,
> >
> > I was reading about Hadoop and got to know that there are two ways to
> > protect against the name node failures.
> >
> > 1) To write to a nfs mount along with the usual local disk.
> >  -or-
> > 2) Use secondary name node. In case of failure of NN , the SNN can take
> in
> > charge.
> >
> > My questions :-
> >
> > 1) SNN is always lagging , so when SNN becomes primary in event of a NN
> > failure ,  then the edits which have not been merged into the image file
> > would be lost , so the system of SNN would not be consistent with the NN
> > before its failure.
> >
> > 2) Also I have read that other purpose of SNN is to periodically merge
> the
> > edit logs with the image file. In case a setup goes with option #1
> (writing
> > to NFS, no SNN) , then who does this merging.
> >
> > Thanks,
> > Rahul
> >
> >
>
>
>
> --
> Harsh J
>

Re: NameNode failure and recovery!

Posted by Harsh J <ha...@cloudera.com>.

There is a 3rd, most excellent way: Use HDFS's own HA, see
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
:)

On Wed, Apr 3, 2013 at 8:10 PM, Rahul Bhattacharjee
<ra...@gmail.com> wrote:
> Hi all,
>
> I was reading about Hadoop and got to know that there are two ways to
> protect against the name node failures.
>
> 1) To write to a nfs mount along with the usual local disk.
>  -or-
> 2) Use secondary name node. In case of failure of NN , the SNN can take in
> charge.
>
> My questions :-
>
> 1) SNN is always lagging , so when SNN becomes primary in event of a NN
> failure ,  then the edits which have not been merged into the image file
> would be lost , so the system of SNN would not be consistent with the NN
> before its failure.
>
> 2) Also I have read that other purpose of SNN is to periodically merge the
> edit logs with the image file. In case a setup goes with option #1 (writing
> to NFS, no SNN) , then who does this merging.
>
> Thanks,
> Rahul
>
>



-- 
Harsh J

Re: NameNode failure and recovery!

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Or both the options are used together. NFS + SNN ?



On Wed, Apr 3, 2013 at 8:10 PM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

> Hi all,
>
> I was reading about Hadoop and got to know that there are two ways to
> protect against the name node failures.
>
> 1) To write to a nfs mount along with the usual local disk.
>  -or-
> 2) Use secondary name node. In case of failure of NN , the SNN can take in
> charge.
>
> My questions :-
>
> 1) SNN is always lagging , so when SNN becomes primary in event of a NN
> failure ,  then the edits which have not been merged into the image file
> would be lost , so the system of SNN would not be consistent with the NN
> before its failure.
>
> 2) Also I have read that other purpose of SNN is to periodically merge the
> edit logs with the image file. In case a setup goes with option #1 (writing
> to NFS, no SNN) , then who does this merging.
>
> Thanks,
> Rahul
>
>
>

Re: NameNode failure and recovery!

Posted by Mohammad Tariq <do...@gmail.com>.

If it's not possible to restart the NN daemon on the same box, then yes.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Wed, Apr 3, 2013 at 9:30 PM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

> Thanks to all of you for precise and complete responses.
>
> S
> o in case of failure we have to bring another backup system up with the
> fsimage and edit logs from the NFS filer.
> SNN stays as is for the new NN.
>
> Thanks,
> Rahul
>
>
> On Wed, Apr 3, 2013 at 8:38 PM, Azuryy Yu <az...@gmail.com> wrote:
>
>> for Hadoopv2, there is HA, so SNN is not necessary.
>> On Apr 3, 2013 10:41 PM, "Rahul Bhattacharjee" <ra...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> I was reading about Hadoop and got to know that there are two ways to
>>> protect against the name node failures.
>>>
>>> 1) To write to a nfs mount along with the usual local disk.
>>>  -or-
>>> 2) Use secondary name node. In case of failure of NN , the SNN can take
>>> in charge.
>>>
>>> My questions :-
>>>
>>> 1) SNN is always lagging , so when SNN becomes primary in event of a NN
>>> failure ,  then the edits which have not been merged into the image file
>>> would be lost , so the system of SNN would not be consistent with the NN
>>> before its failure.
>>>
>>> 2) Also I have read that other purpose of SNN is to periodically merge
>>> the edit logs with the image file. In case a setup goes with option #1
>>> (writing to NFS, no SNN) , then who does this merging.
>>>
>>> Thanks,
>>> Rahul
>>>
>>>
>>>
>

Re: NameNode failure and recovery!

Posted by Mohammad Tariq <do...@gmail.com>.

If it's not possible to restart the NN daemon on the same box, then yes.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Wed, Apr 3, 2013 at 9:30 PM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

> Thanks to all of you for precise and complete responses.
>
> S
> o in case of failure we have to bring another backup system up with the
> fsimage and edit logs from the NFS filer.
> SNN stays as is for the new NN.
>
> Thanks,
> Rahul
>
>
> On Wed, Apr 3, 2013 at 8:38 PM, Azuryy Yu <az...@gmail.com> wrote:
>
>> for Hadoopv2, there is HA, so SNN is not necessary.
>> On Apr 3, 2013 10:41 PM, "Rahul Bhattacharjee" <ra...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> I was reading about Hadoop and got to know that there are two ways to
>>> protect against the name node failures.
>>>
>>> 1) To write to a nfs mount along with the usual local disk.
>>>  -or-
>>> 2) Use secondary name node. In case of failure of NN , the SNN can take
>>> in charge.
>>>
>>> My questions :-
>>>
>>> 1) SNN is always lagging , so when SNN becomes primary in event of a NN
>>> failure ,  then the edits which have not been merged into the image file
>>> would be lost , so the system of SNN would not be consistent with the NN
>>> before its failure.
>>>
>>> 2) Also I have read that other purpose of SNN is to periodically merge
>>> the edit logs with the image file. In case a setup goes with option #1
>>> (writing to NFS, no SNN) , then who does this merging.
>>>
>>> Thanks,
>>> Rahul
>>>
>>>
>>>
>

Re: NameNode failure and recovery!

Posted by Mohammad Tariq <do...@gmail.com>.

If it's not possible to restart the NN daemon on the same box, then yes.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Wed, Apr 3, 2013 at 9:30 PM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

> Thanks to all of you for precise and complete responses.
>
> S
> o in case of failure we have to bring another backup system up with the
> fsimage and edit logs from the NFS filer.
> SNN stays as is for the new NN.
>
> Thanks,
> Rahul
>
>
> On Wed, Apr 3, 2013 at 8:38 PM, Azuryy Yu <az...@gmail.com> wrote:
>
>> for Hadoopv2, there is HA, so SNN is not necessary.
>> On Apr 3, 2013 10:41 PM, "Rahul Bhattacharjee" <ra...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> I was reading about Hadoop and got to know that there are two ways to
>>> protect against the name node failures.
>>>
>>> 1) To write to a nfs mount along with the usual local disk.
>>>  -or-
>>> 2) Use secondary name node. In case of failure of NN , the SNN can take
>>> in charge.
>>>
>>> My questions :-
>>>
>>> 1) SNN is always lagging , so when SNN becomes primary in event of a NN
>>> failure ,  then the edits which have not been merged into the image file
>>> would be lost , so the system of SNN would not be consistent with the NN
>>> before its failure.
>>>
>>> 2) Also I have read that other purpose of SNN is to periodically merge
>>> the edit logs with the image file. In case a setup goes with option #1
>>> (writing to NFS, no SNN) , then who does this merging.
>>>
>>> Thanks,
>>> Rahul
>>>
>>>
>>>
>

Re: NameNode failure and recovery!

Posted by Mohammad Tariq <do...@gmail.com>.

If it's not possible to restart the NN daemon on the same box, then yes.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Wed, Apr 3, 2013 at 9:30 PM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

> Thanks to all of you for precise and complete responses.
>
> S
> o in case of failure we have to bring another backup system up with the
> fsimage and edit logs from the NFS filer.
> SNN stays as is for the new NN.
>
> Thanks,
> Rahul
>
>
> On Wed, Apr 3, 2013 at 8:38 PM, Azuryy Yu <az...@gmail.com> wrote:
>
>> for Hadoopv2, there is HA, so SNN is not necessary.
>> On Apr 3, 2013 10:41 PM, "Rahul Bhattacharjee" <ra...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> I was reading about Hadoop and got to know that there are two ways to
>>> protect against the name node failures.
>>>
>>> 1) To write to a nfs mount along with the usual local disk.
>>>  -or-
>>> 2) Use secondary name node. In case of failure of NN , the SNN can take
>>> in charge.
>>>
>>> My questions :-
>>>
>>> 1) SNN is always lagging , so when SNN becomes primary in event of a NN
>>> failure ,  then the edits which have not been merged into the image file
>>> would be lost , so the system of SNN would not be consistent with the NN
>>> before its failure.
>>>
>>> 2) Also I have read that other purpose of SNN is to periodically merge
>>> the edit logs with the image file. In case a setup goes with option #1
>>> (writing to NFS, no SNN) , then who does this merging.
>>>
>>> Thanks,
>>> Rahul
>>>
>>>
>>>
>

Re: NameNode failure and recovery!

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Thanks to all of you for precise and complete responses.

S
o in case of failure we have to bring another backup system up with the
fsimage and edit logs from the NFS filer.
SNN stays as is for the new NN.

Thanks,
Rahul


On Wed, Apr 3, 2013 at 8:38 PM, Azuryy Yu <az...@gmail.com> wrote:

> for Hadoopv2, there is HA, so SNN is not necessary.
> On Apr 3, 2013 10:41 PM, "Rahul Bhattacharjee" <ra...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> I was reading about Hadoop and got to know that there are two ways to
>> protect against the name node failures.
>>
>> 1) To write to a nfs mount along with the usual local disk.
>>  -or-
>> 2) Use secondary name node. In case of failure of NN , the SNN can take
>> in charge.
>>
>> My questions :-
>>
>> 1) SNN is always lagging , so when SNN becomes primary in event of a NN
>> failure ,  then the edits which have not been merged into the image file
>> would be lost , so the system of SNN would not be consistent with the NN
>> before its failure.
>>
>> 2) Also I have read that other purpose of SNN is to periodically merge
>> the edit logs with the image file. In case a setup goes with option #1
>> (writing to NFS, no SNN) , then who does this merging.
>>
>> Thanks,
>> Rahul
>>
>>
>>

Re: NameNode failure and recovery!

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Thanks to all of you for precise and complete responses.

S
o in case of failure we have to bring another backup system up with the
fsimage and edit logs from the NFS filer.
SNN stays as is for the new NN.

Thanks,
Rahul


On Wed, Apr 3, 2013 at 8:38 PM, Azuryy Yu <az...@gmail.com> wrote:

> for Hadoopv2, there is HA, so SNN is not necessary.
> On Apr 3, 2013 10:41 PM, "Rahul Bhattacharjee" <ra...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> I was reading about Hadoop and got to know that there are two ways to
>> protect against the name node failures.
>>
>> 1) To write to a nfs mount along with the usual local disk.
>>  -or-
>> 2) Use secondary name node. In case of failure of NN , the SNN can take
>> in charge.
>>
>> My questions :-
>>
>> 1) SNN is always lagging , so when SNN becomes primary in event of a NN
>> failure ,  then the edits which have not been merged into the image file
>> would be lost , so the system of SNN would not be consistent with the NN
>> before its failure.
>>
>> 2) Also I have read that other purpose of SNN is to periodically merge
>> the edit logs with the image file. In case a setup goes with option #1
>> (writing to NFS, no SNN) , then who does this merging.
>>
>> Thanks,
>> Rahul
>>
>>
>>

Re: NameNode failure and recovery!

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Thanks to all of you for precise and complete responses.

S
o in case of failure we have to bring another backup system up with the
fsimage and edit logs from the NFS filer.
SNN stays as is for the new NN.

Thanks,
Rahul


On Wed, Apr 3, 2013 at 8:38 PM, Azuryy Yu <az...@gmail.com> wrote:

> for Hadoopv2, there is HA, so SNN is not necessary.
> On Apr 3, 2013 10:41 PM, "Rahul Bhattacharjee" <ra...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> I was reading about Hadoop and got to know that there are two ways to
>> protect against the name node failures.
>>
>> 1) To write to a nfs mount along with the usual local disk.
>>  -or-
>> 2) Use secondary name node. In case of failure of NN , the SNN can take
>> in charge.
>>
>> My questions :-
>>
>> 1) SNN is always lagging , so when SNN becomes primary in event of a NN
>> failure ,  then the edits which have not been merged into the image file
>> would be lost , so the system of SNN would not be consistent with the NN
>> before its failure.
>>
>> 2) Also I have read that other purpose of SNN is to periodically merge
>> the edit logs with the image file. In case a setup goes with option #1
>> (writing to NFS, no SNN) , then who does this merging.
>>
>> Thanks,
>> Rahul
>>
>>
>>

Re: NameNode failure and recovery!

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Thanks to all of you for precise and complete responses.

S
o in case of failure we have to bring another backup system up with the
fsimage and edit logs from the NFS filer.
SNN stays as is for the new NN.

Thanks,
Rahul


On Wed, Apr 3, 2013 at 8:38 PM, Azuryy Yu <az...@gmail.com> wrote:

> for Hadoopv2, there is HA, so SNN is not necessary.
> On Apr 3, 2013 10:41 PM, "Rahul Bhattacharjee" <ra...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> I was reading about Hadoop and got to know that there are two ways to
>> protect against the name node failures.
>>
>> 1) To write to a nfs mount along with the usual local disk.
>>  -or-
>> 2) Use secondary name node. In case of failure of NN , the SNN can take
>> in charge.
>>
>> My questions :-
>>
>> 1) SNN is always lagging , so when SNN becomes primary in event of a NN
>> failure ,  then the edits which have not been merged into the image file
>> would be lost , so the system of SNN would not be consistent with the NN
>> before its failure.
>>
>> 2) Also I have read that other purpose of SNN is to periodically merge
>> the edit logs with the image file. In case a setup goes with option #1
>> (writing to NFS, no SNN) , then who does this merging.
>>
>> Thanks,
>> Rahul
>>
>>
>>

Re: NameNode failure and recovery!

Posted by Azuryy Yu <az...@gmail.com>.

for Hadoopv2, there is HA, so SNN is not necessary.
On Apr 3, 2013 10:41 PM, "Rahul Bhattacharjee" <ra...@gmail.com>
wrote:

> Hi all,
>
> I was reading about Hadoop and got to know that there are two ways to
> protect against the name node failures.
>
> 1) To write to a nfs mount along with the usual local disk.
>  -or-
> 2) Use secondary name node. In case of failure of NN , the SNN can take in
> charge.
>
> My questions :-
>
> 1) SNN is always lagging , so when SNN becomes primary in event of a NN
> failure ,  then the edits which have not been merged into the image file
> would be lost , so the system of SNN would not be consistent with the NN
> before its failure.
>
> 2) Also I have read that other purpose of SNN is to periodically merge the
> edit logs with the image file. In case a setup goes with option #1 (writing
> to NFS, no SNN) , then who does this merging.
>
> Thanks,
> Rahul
>
>
>

Re: NameNode failure and recovery!

Posted by Harsh J <ha...@cloudera.com>.

There is a 3rd, most excellent way: Use HDFS's own HA, see
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
:)

On Wed, Apr 3, 2013 at 8:10 PM, Rahul Bhattacharjee
<ra...@gmail.com> wrote:
> Hi all,
>
> I was reading about Hadoop and got to know that there are two ways to
> protect against the name node failures.
>
> 1) To write to a nfs mount along with the usual local disk.
>  -or-
> 2) Use secondary name node. In case of failure of NN , the SNN can take in
> charge.
>
> My questions :-
>
> 1) SNN is always lagging , so when SNN becomes primary in event of a NN
> failure ,  then the edits which have not been merged into the image file
> would be lost , so the system of SNN would not be consistent with the NN
> before its failure.
>
> 2) Also I have read that other purpose of SNN is to periodically merge the
> edit logs with the image file. In case a setup goes with option #1 (writing
> to NFS, no SNN) , then who does this merging.
>
> Thanks,
> Rahul
>
>



-- 
Harsh J

Re: NameNode failure and recovery!

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Or both the options are used together. NFS + SNN ?



On Wed, Apr 3, 2013 at 8:10 PM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

> Hi all,
>
> I was reading about Hadoop and got to know that there are two ways to
> protect against the name node failures.
>
> 1) To write to a nfs mount along with the usual local disk.
>  -or-
> 2) Use secondary name node. In case of failure of NN , the SNN can take in
> charge.
>
> My questions :-
>
> 1) SNN is always lagging , so when SNN becomes primary in event of a NN
> failure ,  then the edits which have not been merged into the image file
> would be lost , so the system of SNN would not be consistent with the NN
> before its failure.
>
> 2) Also I have read that other purpose of SNN is to periodically merge the
> edit logs with the image file. In case a setup goes with option #1 (writing
> to NFS, no SNN) , then who does this merging.
>
> Thanks,
> Rahul
>
>
>

Re: NameNode failure and recovery!

Posted by Harsh J <ha...@cloudera.com>.

There is a 3rd, most excellent way: Use HDFS's own HA, see
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
:)

On Wed, Apr 3, 2013 at 8:10 PM, Rahul Bhattacharjee
<ra...@gmail.com> wrote:
> Hi all,
>
> I was reading about Hadoop and got to know that there are two ways to
> protect against the name node failures.
>
> 1) To write to a nfs mount along with the usual local disk.
>  -or-
> 2) Use secondary name node. In case of failure of NN , the SNN can take in
> charge.
>
> My questions :-
>
> 1) SNN is always lagging , so when SNN becomes primary in event of a NN
> failure ,  then the edits which have not been merged into the image file
> would be lost , so the system of SNN would not be consistent with the NN
> before its failure.
>
> 2) Also I have read that other purpose of SNN is to periodically merge the
> edit logs with the image file. In case a setup goes with option #1 (writing
> to NFS, no SNN) , then who does this merging.
>
> Thanks,
> Rahul
>
>



-- 
Harsh J

Re: NameNode failure and recovery!

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Or both the options are used together. NFS + SNN ?



On Wed, Apr 3, 2013 at 8:10 PM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

> Hi all,
>
> I was reading about Hadoop and got to know that there are two ways to
> protect against the name node failures.
>
> 1) To write to a nfs mount along with the usual local disk.
>  -or-
> 2) Use secondary name node. In case of failure of NN , the SNN can take in
> charge.
>
> My questions :-
>
> 1) SNN is always lagging , so when SNN becomes primary in event of a NN
> failure ,  then the edits which have not been merged into the image file
> would be lost , so the system of SNN would not be consistent with the NN
> before its failure.
>
> 2) Also I have read that other purpose of SNN is to periodically merge the
> edit logs with the image file. In case a setup goes with option #1 (writing
> to NFS, no SNN) , then who does this merging.
>
> Thanks,
> Rahul
>
>
>

RE: NameNode failure and recovery!

Posted by Vijay Thakorlal <vi...@hotmail.com>.

Hi Rahul,

 

The SNN does not act as a backup / standby NameNode in the event of failure. 

 

The sole purpose of the Secondary NameNode (or as it’s otherwise / more correctly known as the Checkpoint Node) is to perform checkpointing of the current state of HDFS:

 

The SNN retrieves the fsimage and edits files from the NN 

The NN rolls the edits file

The SNN Loads the fsimage into memory 

Then the SNN replays the edits log file to merge the two

Then the SNN transfers the merged checkpoint back to the NN

The NN uses the checkpoint as the new fsimage file

 

It’s true that technically you could use the fsimage from the SNN if completely lost the NN – and yes as you said you would “lose” any changes to HDFS that occurred between the NN dieing and the last time the checkpoint occurred. But as mentioned the SNN is not a backup for the NN.

 

Regards,

Vijay

 

From: Rahul Bhattacharjee [mailto:rahul.rec.dgp@gmail.com] 
Sent: 03 April 2013 15:40
To: user@hadoop.apache.org
Subject: NameNode failure and recovery!

 

Hi all,

I was reading about Hadoop and got to know that there are two ways to protect against the name node failures.

1) To write to a nfs mount along with the usual local disk.

 -or-

2) Use secondary name node. In case of failure of NN , the SNN can take in charge. 

My questions :-

1) SNN is always lagging , so when SNN becomes primary in event of a NN failure ,  then the edits which have not been merged into the image file would be lost , so the system of SNN would not be consistent with the NN before its failure.

2) Also I have read that other purpose of SNN is to periodically merge the edit logs with the image file. In case a setup goes with option #1 (writing to NFS, no SNN) , then who does this merging.

 

Thanks,
Rahul

RE: NameNode failure and recovery!

Posted by Vijay Thakorlal <vi...@hotmail.com>.

Hi Rahul,

 

The SNN does not act as a backup / standby NameNode in the event of failure. 

 

The sole purpose of the Secondary NameNode (or as it’s otherwise / more correctly known as the Checkpoint Node) is to perform checkpointing of the current state of HDFS:

 

The SNN retrieves the fsimage and edits files from the NN 

The NN rolls the edits file

The SNN Loads the fsimage into memory 

Then the SNN replays the edits log file to merge the two

Then the SNN transfers the merged checkpoint back to the NN

The NN uses the checkpoint as the new fsimage file

 

It’s true that technically you could use the fsimage from the SNN if completely lost the NN – and yes as you said you would “lose” any changes to HDFS that occurred between the NN dieing and the last time the checkpoint occurred. But as mentioned the SNN is not a backup for the NN.

 

Regards,

Vijay

 

From: Rahul Bhattacharjee [mailto:rahul.rec.dgp@gmail.com] 
Sent: 03 April 2013 15:40
To: user@hadoop.apache.org
Subject: NameNode failure and recovery!

 

Hi all,

I was reading about Hadoop and got to know that there are two ways to protect against the name node failures.

1) To write to a nfs mount along with the usual local disk.

 -or-

2) Use secondary name node. In case of failure of NN , the SNN can take in charge. 

My questions :-

1) SNN is always lagging , so when SNN becomes primary in event of a NN failure ,  then the edits which have not been merged into the image file would be lost , so the system of SNN would not be consistent with the NN before its failure.

2) Also I have read that other purpose of SNN is to periodically merge the edit logs with the image file. In case a setup goes with option #1 (writing to NFS, no SNN) , then who does this merging.

 

Thanks,
Rahul

Re: NameNode failure and recovery!

Posted by Azuryy Yu <az...@gmail.com>.

for Hadoopv2, there is HA, so SNN is not necessary.
On Apr 3, 2013 10:41 PM, "Rahul Bhattacharjee" <ra...@gmail.com>
wrote:

> Hi all,
>
> I was reading about Hadoop and got to know that there are two ways to
> protect against the name node failures.
>
> 1) To write to a nfs mount along with the usual local disk.
>  -or-
> 2) Use secondary name node. In case of failure of NN , the SNN can take in
> charge.
>
> My questions :-
>
> 1) SNN is always lagging , so when SNN becomes primary in event of a NN
> failure ,  then the edits which have not been merged into the image file
> would be lost , so the system of SNN would not be consistent with the NN
> before its failure.
>
> 2) Also I have read that other purpose of SNN is to periodically merge the
> edit logs with the image file. In case a setup goes with option #1 (writing
> to NFS, no SNN) , then who does this merging.
>
> Thanks,
> Rahul
>
>
>

Re: NameNode failure and recovery!

Posted by Azuryy Yu <az...@gmail.com>.

for Hadoopv2, there is HA, so SNN is not necessary.
On Apr 3, 2013 10:41 PM, "Rahul Bhattacharjee" <ra...@gmail.com>
wrote:

> Hi all,
>
> I was reading about Hadoop and got to know that there are two ways to
> protect against the name node failures.
>
> 1) To write to a nfs mount along with the usual local disk.
>  -or-
> 2) Use secondary name node. In case of failure of NN , the SNN can take in
> charge.
>
> My questions :-
>
> 1) SNN is always lagging , so when SNN becomes primary in event of a NN
> failure ,  then the edits which have not been merged into the image file
> would be lost , so the system of SNN would not be consistent with the NN
> before its failure.
>
> 2) Also I have read that other purpose of SNN is to periodically merge the
> edit logs with the image file. In case a setup goes with option #1 (writing
> to NFS, no SNN) , then who does this merging.
>
> Thanks,
> Rahul
>
>
>

Re: NameNode failure and recovery!

Posted by Harsh J <ha...@cloudera.com>.

There is a 3rd, most excellent way: Use HDFS's own HA, see
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
:)

On Wed, Apr 3, 2013 at 8:10 PM, Rahul Bhattacharjee
<ra...@gmail.com> wrote:
> Hi all,
>
> I was reading about Hadoop and got to know that there are two ways to
> protect against the name node failures.
>
> 1) To write to a nfs mount along with the usual local disk.
>  -or-
> 2) Use secondary name node. In case of failure of NN , the SNN can take in
> charge.
>
> My questions :-
>
> 1) SNN is always lagging , so when SNN becomes primary in event of a NN
> failure ,  then the edits which have not been merged into the image file
> would be lost , so the system of SNN would not be consistent with the NN
> before its failure.
>
> 2) Also I have read that other purpose of SNN is to periodically merge the
> edit logs with the image file. In case a setup goes with option #1 (writing
> to NFS, no SNN) , then who does this merging.
>
> Thanks,
> Rahul
>
>



-- 
Harsh J

RE: NameNode failure and recovery!

Posted by Vijay Thakorlal <vi...@hotmail.com>.

Hi Rahul,

 

The SNN does not act as a backup / standby NameNode in the event of failure. 

 

The sole purpose of the Secondary NameNode (or as it’s otherwise / more correctly known as the Checkpoint Node) is to perform checkpointing of the current state of HDFS:

 

The SNN retrieves the fsimage and edits files from the NN 

The NN rolls the edits file

The SNN Loads the fsimage into memory 

Then the SNN replays the edits log file to merge the two

Then the SNN transfers the merged checkpoint back to the NN

The NN uses the checkpoint as the new fsimage file

 

It’s true that technically you could use the fsimage from the SNN if completely lost the NN – and yes as you said you would “lose” any changes to HDFS that occurred between the NN dieing and the last time the checkpoint occurred. But as mentioned the SNN is not a backup for the NN.

 

Regards,

Vijay

 

From: Rahul Bhattacharjee [mailto:rahul.rec.dgp@gmail.com] 
Sent: 03 April 2013 15:40
To: user@hadoop.apache.org
Subject: NameNode failure and recovery!

 

Hi all,

I was reading about Hadoop and got to know that there are two ways to protect against the name node failures.

1) To write to a nfs mount along with the usual local disk.

 -or-

2) Use secondary name node. In case of failure of NN , the SNN can take in charge. 

My questions :-

1) SNN is always lagging , so when SNN becomes primary in event of a NN failure ,  then the edits which have not been merged into the image file would be lost , so the system of SNN would not be consistent with the NN before its failure.

2) Also I have read that other purpose of SNN is to periodically merge the edit logs with the image file. In case a setup goes with option #1 (writing to NFS, no SNN) , then who does this merging.

 

Thanks,
Rahul

Re: NameNode failure and recovery!

Posted by Azuryy Yu <az...@gmail.com>.

for Hadoopv2, there is HA, so SNN is not necessary.
On Apr 3, 2013 10:41 PM, "Rahul Bhattacharjee" <ra...@gmail.com>
wrote:

> Hi all,
>
> I was reading about Hadoop and got to know that there are two ways to
> protect against the name node failures.
>
> 1) To write to a nfs mount along with the usual local disk.
>  -or-
> 2) Use secondary name node. In case of failure of NN , the SNN can take in
> charge.
>
> My questions :-
>
> 1) SNN is always lagging , so when SNN becomes primary in event of a NN
> failure ,  then the edits which have not been merged into the image file
> would be lost , so the system of SNN would not be consistent with the NN
> before its failure.
>
> 2) Also I have read that other purpose of SNN is to periodically merge the
> edit logs with the image file. In case a setup goes with option #1 (writing
> to NFS, no SNN) , then who does this merging.
>
> Thanks,
> Rahul
>
>
>