You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Bertrand Dechoux <de...@gmail.com> on 2012/09/21 10:17:08 UTC

Relevance of dfs.safemode.extension?

Hi,

I would like to know the relevance of dfs.safemode.extension.
Why would someone wait after leaving the safemode?
Why is it recommended not to set it to 0 instead of 30000 (30 seconds)?

Regards

Bertrand

Re: Relevance of dfs.safemode.extension?

Posted by Harsh J <ha...@cloudera.com>.
For big-large clusters, it helps if the NN waits for N seconds after
the threshold percentage being satisfied (minimum # of replicas of
file's blocks being available) so that other DNs get some extra time
to report in their blocks as well and help ease the initial client
load the cluster receives. This is where the extension comes useful at
(certainly tunable to a more suitable value).

For small clusters (single rack or so) you can probably make it 0 to
shed off the extra wait.

However, if you're ever working with NN recovery stuff (one reason the
NN is down, due to), I recommend setting the threshold itself to >
1.1f to make sure the NN doesn't auto-exit safemode until you're sure
that the new inode/block counts are alright and you haven't made any
mistakes with the recovery process. You can then exit safemode
manually when sure. In safemode, the NN does not issue block
deletions, so data loss would not occur out of mistakes made (such as
starting with an old copy of fsimage accidentally, etc.)

On Fri, Sep 21, 2012 at 1:47 PM, Bertrand Dechoux <de...@gmail.com> wrote:
> Hi,
>
> I would like to know the relevance of dfs.safemode.extension.
> Why would someone wait after leaving the safemode?
> Why is it recommended not to set it to 0 instead of 30000 (30 seconds)?
>
> Regards
>
> Bertrand



-- 
Harsh J

Re: Relevance of dfs.safemode.extension?

Posted by Harsh J <ha...@cloudera.com>.
For big-large clusters, it helps if the NN waits for N seconds after
the threshold percentage being satisfied (minimum # of replicas of
file's blocks being available) so that other DNs get some extra time
to report in their blocks as well and help ease the initial client
load the cluster receives. This is where the extension comes useful at
(certainly tunable to a more suitable value).

For small clusters (single rack or so) you can probably make it 0 to
shed off the extra wait.

However, if you're ever working with NN recovery stuff (one reason the
NN is down, due to), I recommend setting the threshold itself to >
1.1f to make sure the NN doesn't auto-exit safemode until you're sure
that the new inode/block counts are alright and you haven't made any
mistakes with the recovery process. You can then exit safemode
manually when sure. In safemode, the NN does not issue block
deletions, so data loss would not occur out of mistakes made (such as
starting with an old copy of fsimage accidentally, etc.)

On Fri, Sep 21, 2012 at 1:47 PM, Bertrand Dechoux <de...@gmail.com> wrote:
> Hi,
>
> I would like to know the relevance of dfs.safemode.extension.
> Why would someone wait after leaving the safemode?
> Why is it recommended not to set it to 0 instead of 30000 (30 seconds)?
>
> Regards
>
> Bertrand



-- 
Harsh J

Re: Relevance of dfs.safemode.extension?

Posted by Harsh J <ha...@cloudera.com>.
For big-large clusters, it helps if the NN waits for N seconds after
the threshold percentage being satisfied (minimum # of replicas of
file's blocks being available) so that other DNs get some extra time
to report in their blocks as well and help ease the initial client
load the cluster receives. This is where the extension comes useful at
(certainly tunable to a more suitable value).

For small clusters (single rack or so) you can probably make it 0 to
shed off the extra wait.

However, if you're ever working with NN recovery stuff (one reason the
NN is down, due to), I recommend setting the threshold itself to >
1.1f to make sure the NN doesn't auto-exit safemode until you're sure
that the new inode/block counts are alright and you haven't made any
mistakes with the recovery process. You can then exit safemode
manually when sure. In safemode, the NN does not issue block
deletions, so data loss would not occur out of mistakes made (such as
starting with an old copy of fsimage accidentally, etc.)

On Fri, Sep 21, 2012 at 1:47 PM, Bertrand Dechoux <de...@gmail.com> wrote:
> Hi,
>
> I would like to know the relevance of dfs.safemode.extension.
> Why would someone wait after leaving the safemode?
> Why is it recommended not to set it to 0 instead of 30000 (30 seconds)?
>
> Regards
>
> Bertrand



-- 
Harsh J

Re: Relevance of dfs.safemode.extension?

Posted by Harsh J <ha...@cloudera.com>.
For big-large clusters, it helps if the NN waits for N seconds after
the threshold percentage being satisfied (minimum # of replicas of
file's blocks being available) so that other DNs get some extra time
to report in their blocks as well and help ease the initial client
load the cluster receives. This is where the extension comes useful at
(certainly tunable to a more suitable value).

For small clusters (single rack or so) you can probably make it 0 to
shed off the extra wait.

However, if you're ever working with NN recovery stuff (one reason the
NN is down, due to), I recommend setting the threshold itself to >
1.1f to make sure the NN doesn't auto-exit safemode until you're sure
that the new inode/block counts are alright and you haven't made any
mistakes with the recovery process. You can then exit safemode
manually when sure. In safemode, the NN does not issue block
deletions, so data loss would not occur out of mistakes made (such as
starting with an old copy of fsimage accidentally, etc.)

On Fri, Sep 21, 2012 at 1:47 PM, Bertrand Dechoux <de...@gmail.com> wrote:
> Hi,
>
> I would like to know the relevance of dfs.safemode.extension.
> Why would someone wait after leaving the safemode?
> Why is it recommended not to set it to 0 instead of 30000 (30 seconds)?
>
> Regards
>
> Bertrand



-- 
Harsh J