You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Frank Lanitz <fr...@sql-ag.de> on 2015/01/23 08:19:31 UTC

Time until a datanode is marked as dead

Hi,

I'm trying to configure the time a datanode needs to be considered dead.
Currently it appears to be set to something about 10min which is a
little to high for my scenario. As I wasn't able to find some obvious
flag, I've tried to set some properties, which might could do that.
Without succes. So e.g. I've put into my hdfs-site.xml

<property>
    <name>dfs.namenode.check.stale.datanode</name>
    <value>true</value>
    <description>Activate stale check</description>
</property>

<property>
    <name>dfs.namenode.stale.datanode.interval</name>
    <value>10</value>
    <description>Timeout</description>
</property>

So my question is: Which option(s) I have to set in order to e.g.
decrease time needed to mark a datanode as dead to 5min running 2.6.

Cheers,
Frank

Re: Time until a datanode is marked as dead

Posted by Chris Nauroth <cn...@hortonworks.com>.

I believe all properties related to stale datanode configuration are
already covered in hdfs-default.xml,
but dfs.namenode.heartbeat.recheck-interval is definitely missing.

Frank, if you file the jira, then a nice benefit is that you'll get signed
up automatically for notifications on it when someone makes progress on it.

Chris Nauroth
Hortonworks
http://hortonworks.com/


On Mon, Jan 26, 2015 at 8:00 AM, Nicolas Liochon <nk...@gmail.com> wrote:

> Note that there is a difference between being dead and being stale. stale
> means "avoid as much as possible" while dead means "avoid absolutely AND
> initiate a recovery, i.e. copy all the data (typically 1 or more Tb)"
>
> There is some info on this blog entry:
> http://hortonworks.com/blog/introduction-to-hbase-mean-time-to-recover-mttr/
>
> Cheers,
>
> Nicolas
>
>
> On Mon, Jan 26, 2015 at 10:46 AM, Azuryy Yu <az...@gmail.com> wrote:
>
>> Hi Frank,
>>
>> can you file an issue to add this configuration to the hdfs-default.xml?
>>
>> On Mon, Jan 26, 2015 at 5:39 PM, Frank Lanitz <fr...@sql-ag.de>
>> wrote:
>>
>>> Hi,
>>>
>>> Am 23.01.2015 um 19:23 schrieb Chris Nauroth:
>>> > The time period for determining if a datanode is dead is calculated as
>>> a
>>> > function of a few different configuration properties.  The current
>>> > implementation in DatanodeManager.java does it like this:
>>> >
>>> >     final long heartbeatIntervalSeconds = conf.getLong(
>>> >         DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY,
>>> >         DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_DEFAULT);
>>> >     final int heartbeatRecheckInterval = conf.getInt(
>>> >         DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_KEY,
>>> >         DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_DEFAULT);
>>> > // 5 minutes
>>> >     this.heartbeatExpireInterval = 2 * heartbeatRecheckInterval
>>> >         + 10 * 1000 * heartbeatIntervalSeconds;
>>>
>>>
>>> Good to know.
>>>
>>> > Under default configuration, dfs.namenode.heartbeat.recheck-interval is
>>> > 5 minutes and dfs.heartbeat.interval is 3 seconds.  If we plug those
>>> > values into the formula, we get 10.5 minutes, which agrees with your
>>> > observation.  If you change dfs.namenode.heartbeat.recheck-interval to
>>> > 2.5 minutes, then you'll achieve an effective timeout of 5.5 minutes
>>> > before a datanode is marked dead.
>>> >
>>> > dfs.namenode.heartbeat.recheck-interval is not documented in
>>> > hdfs-default.xml, though I don't recall if that's an intentional choice
>>> > or just an oversight.  The value of the property must be expressed in
>>> > milliseconds.
>>>
>>> This did the trick. Thank you very much. For testing porpuse I've set it
>>> to 10000 and after approx 45s the node was marked as dead.
>>>
>>> Any chance to get this into a documented preference so possible behavior
>>> changes with future releases can be spotted before staging area.
>>>
>>> cheers,
>>> Frank
>>>
>>
>>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Time until a datanode is marked as dead

Posted by Chris Nauroth <cn...@hortonworks.com>.

I believe all properties related to stale datanode configuration are
already covered in hdfs-default.xml,
but dfs.namenode.heartbeat.recheck-interval is definitely missing.

Frank, if you file the jira, then a nice benefit is that you'll get signed
up automatically for notifications on it when someone makes progress on it.

Chris Nauroth
Hortonworks
http://hortonworks.com/


On Mon, Jan 26, 2015 at 8:00 AM, Nicolas Liochon <nk...@gmail.com> wrote:

> Note that there is a difference between being dead and being stale. stale
> means "avoid as much as possible" while dead means "avoid absolutely AND
> initiate a recovery, i.e. copy all the data (typically 1 or more Tb)"
>
> There is some info on this blog entry:
> http://hortonworks.com/blog/introduction-to-hbase-mean-time-to-recover-mttr/
>
> Cheers,
>
> Nicolas
>
>
> On Mon, Jan 26, 2015 at 10:46 AM, Azuryy Yu <az...@gmail.com> wrote:
>
>> Hi Frank,
>>
>> can you file an issue to add this configuration to the hdfs-default.xml?
>>
>> On Mon, Jan 26, 2015 at 5:39 PM, Frank Lanitz <fr...@sql-ag.de>
>> wrote:
>>
>>> Hi,
>>>
>>> Am 23.01.2015 um 19:23 schrieb Chris Nauroth:
>>> > The time period for determining if a datanode is dead is calculated as
>>> a
>>> > function of a few different configuration properties.  The current
>>> > implementation in DatanodeManager.java does it like this:
>>> >
>>> >     final long heartbeatIntervalSeconds = conf.getLong(
>>> >         DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY,
>>> >         DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_DEFAULT);
>>> >     final int heartbeatRecheckInterval = conf.getInt(
>>> >         DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_KEY,
>>> >         DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_DEFAULT);
>>> > // 5 minutes
>>> >     this.heartbeatExpireInterval = 2 * heartbeatRecheckInterval
>>> >         + 10 * 1000 * heartbeatIntervalSeconds;
>>>
>>>
>>> Good to know.
>>>
>>> > Under default configuration, dfs.namenode.heartbeat.recheck-interval is
>>> > 5 minutes and dfs.heartbeat.interval is 3 seconds.  If we plug those
>>> > values into the formula, we get 10.5 minutes, which agrees with your
>>> > observation.  If you change dfs.namenode.heartbeat.recheck-interval to
>>> > 2.5 minutes, then you'll achieve an effective timeout of 5.5 minutes
>>> > before a datanode is marked dead.
>>> >
>>> > dfs.namenode.heartbeat.recheck-interval is not documented in
>>> > hdfs-default.xml, though I don't recall if that's an intentional choice
>>> > or just an oversight.  The value of the property must be expressed in
>>> > milliseconds.
>>>
>>> This did the trick. Thank you very much. For testing porpuse I've set it
>>> to 10000 and after approx 45s the node was marked as dead.
>>>
>>> Any chance to get this into a documented preference so possible behavior
>>> changes with future releases can be spotted before staging area.
>>>
>>> cheers,
>>> Frank
>>>
>>
>>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Time until a datanode is marked as dead

Posted by Chris Nauroth <cn...@hortonworks.com>.

I believe all properties related to stale datanode configuration are
already covered in hdfs-default.xml,
but dfs.namenode.heartbeat.recheck-interval is definitely missing.

Frank, if you file the jira, then a nice benefit is that you'll get signed
up automatically for notifications on it when someone makes progress on it.

Chris Nauroth
Hortonworks
http://hortonworks.com/


On Mon, Jan 26, 2015 at 8:00 AM, Nicolas Liochon <nk...@gmail.com> wrote:

> Note that there is a difference between being dead and being stale. stale
> means "avoid as much as possible" while dead means "avoid absolutely AND
> initiate a recovery, i.e. copy all the data (typically 1 or more Tb)"
>
> There is some info on this blog entry:
> http://hortonworks.com/blog/introduction-to-hbase-mean-time-to-recover-mttr/
>
> Cheers,
>
> Nicolas
>
>
> On Mon, Jan 26, 2015 at 10:46 AM, Azuryy Yu <az...@gmail.com> wrote:
>
>> Hi Frank,
>>
>> can you file an issue to add this configuration to the hdfs-default.xml?
>>
>> On Mon, Jan 26, 2015 at 5:39 PM, Frank Lanitz <fr...@sql-ag.de>
>> wrote:
>>
>>> Hi,
>>>
>>> Am 23.01.2015 um 19:23 schrieb Chris Nauroth:
>>> > The time period for determining if a datanode is dead is calculated as
>>> a
>>> > function of a few different configuration properties.  The current
>>> > implementation in DatanodeManager.java does it like this:
>>> >
>>> >     final long heartbeatIntervalSeconds = conf.getLong(
>>> >         DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY,
>>> >         DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_DEFAULT);
>>> >     final int heartbeatRecheckInterval = conf.getInt(
>>> >         DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_KEY,
>>> >         DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_DEFAULT);
>>> > // 5 minutes
>>> >     this.heartbeatExpireInterval = 2 * heartbeatRecheckInterval
>>> >         + 10 * 1000 * heartbeatIntervalSeconds;
>>>
>>>
>>> Good to know.
>>>
>>> > Under default configuration, dfs.namenode.heartbeat.recheck-interval is
>>> > 5 minutes and dfs.heartbeat.interval is 3 seconds.  If we plug those
>>> > values into the formula, we get 10.5 minutes, which agrees with your
>>> > observation.  If you change dfs.namenode.heartbeat.recheck-interval to
>>> > 2.5 minutes, then you'll achieve an effective timeout of 5.5 minutes
>>> > before a datanode is marked dead.
>>> >
>>> > dfs.namenode.heartbeat.recheck-interval is not documented in
>>> > hdfs-default.xml, though I don't recall if that's an intentional choice
>>> > or just an oversight.  The value of the property must be expressed in
>>> > milliseconds.
>>>
>>> This did the trick. Thank you very much. For testing porpuse I've set it
>>> to 10000 and after approx 45s the node was marked as dead.
>>>
>>> Any chance to get this into a documented preference so possible behavior
>>> changes with future releases can be spotted before staging area.
>>>
>>> cheers,
>>> Frank
>>>
>>
>>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Time until a datanode is marked as dead

Posted by Chris Nauroth <cn...@hortonworks.com>.

I believe all properties related to stale datanode configuration are
already covered in hdfs-default.xml,
but dfs.namenode.heartbeat.recheck-interval is definitely missing.

Frank, if you file the jira, then a nice benefit is that you'll get signed
up automatically for notifications on it when someone makes progress on it.

Chris Nauroth
Hortonworks
http://hortonworks.com/


On Mon, Jan 26, 2015 at 8:00 AM, Nicolas Liochon <nk...@gmail.com> wrote:

> Note that there is a difference between being dead and being stale. stale
> means "avoid as much as possible" while dead means "avoid absolutely AND
> initiate a recovery, i.e. copy all the data (typically 1 or more Tb)"
>
> There is some info on this blog entry:
> http://hortonworks.com/blog/introduction-to-hbase-mean-time-to-recover-mttr/
>
> Cheers,
>
> Nicolas
>
>
> On Mon, Jan 26, 2015 at 10:46 AM, Azuryy Yu <az...@gmail.com> wrote:
>
>> Hi Frank,
>>
>> can you file an issue to add this configuration to the hdfs-default.xml?
>>
>> On Mon, Jan 26, 2015 at 5:39 PM, Frank Lanitz <fr...@sql-ag.de>
>> wrote:
>>
>>> Hi,
>>>
>>> Am 23.01.2015 um 19:23 schrieb Chris Nauroth:
>>> > The time period for determining if a datanode is dead is calculated as
>>> a
>>> > function of a few different configuration properties.  The current
>>> > implementation in DatanodeManager.java does it like this:
>>> >
>>> >     final long heartbeatIntervalSeconds = conf.getLong(
>>> >         DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY,
>>> >         DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_DEFAULT);
>>> >     final int heartbeatRecheckInterval = conf.getInt(
>>> >         DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_KEY,
>>> >         DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_DEFAULT);
>>> > // 5 minutes
>>> >     this.heartbeatExpireInterval = 2 * heartbeatRecheckInterval
>>> >         + 10 * 1000 * heartbeatIntervalSeconds;
>>>
>>>
>>> Good to know.
>>>
>>> > Under default configuration, dfs.namenode.heartbeat.recheck-interval is
>>> > 5 minutes and dfs.heartbeat.interval is 3 seconds.  If we plug those
>>> > values into the formula, we get 10.5 minutes, which agrees with your
>>> > observation.  If you change dfs.namenode.heartbeat.recheck-interval to
>>> > 2.5 minutes, then you'll achieve an effective timeout of 5.5 minutes
>>> > before a datanode is marked dead.
>>> >
>>> > dfs.namenode.heartbeat.recheck-interval is not documented in
>>> > hdfs-default.xml, though I don't recall if that's an intentional choice
>>> > or just an oversight.  The value of the property must be expressed in
>>> > milliseconds.
>>>
>>> This did the trick. Thank you very much. For testing porpuse I've set it
>>> to 10000 and after approx 45s the node was marked as dead.
>>>
>>> Any chance to get this into a documented preference so possible behavior
>>> changes with future releases can be spotted before staging area.
>>>
>>> cheers,
>>> Frank
>>>
>>
>>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Time until a datanode is marked as dead

Posted by Nicolas Liochon <nk...@gmail.com>.

Note that there is a difference between being dead and being stale. stale
means "avoid as much as possible" while dead means "avoid absolutely AND
initiate a recovery, i.e. copy all the data (typically 1 or more Tb)"

There is some info on this blog entry:
http://hortonworks.com/blog/introduction-to-hbase-mean-time-to-recover-mttr/

Cheers,

Nicolas


On Mon, Jan 26, 2015 at 10:46 AM, Azuryy Yu <az...@gmail.com> wrote:

> Hi Frank,
>
> can you file an issue to add this configuration to the hdfs-default.xml?
>
> On Mon, Jan 26, 2015 at 5:39 PM, Frank Lanitz <fr...@sql-ag.de>
> wrote:
>
>> Hi,
>>
>> Am 23.01.2015 um 19:23 schrieb Chris Nauroth:
>> > The time period for determining if a datanode is dead is calculated as a
>> > function of a few different configuration properties.  The current
>> > implementation in DatanodeManager.java does it like this:
>> >
>> >     final long heartbeatIntervalSeconds = conf.getLong(
>> >         DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY,
>> >         DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_DEFAULT);
>> >     final int heartbeatRecheckInterval = conf.getInt(
>> >         DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_KEY,
>> >         DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_DEFAULT);
>> > // 5 minutes
>> >     this.heartbeatExpireInterval = 2 * heartbeatRecheckInterval
>> >         + 10 * 1000 * heartbeatIntervalSeconds;
>>
>>
>> Good to know.
>>
>> > Under default configuration, dfs.namenode.heartbeat.recheck-interval is
>> > 5 minutes and dfs.heartbeat.interval is 3 seconds.  If we plug those
>> > values into the formula, we get 10.5 minutes, which agrees with your
>> > observation.  If you change dfs.namenode.heartbeat.recheck-interval to
>> > 2.5 minutes, then you'll achieve an effective timeout of 5.5 minutes
>> > before a datanode is marked dead.
>> >
>> > dfs.namenode.heartbeat.recheck-interval is not documented in
>> > hdfs-default.xml, though I don't recall if that's an intentional choice
>> > or just an oversight.  The value of the property must be expressed in
>> > milliseconds.
>>
>> This did the trick. Thank you very much. For testing porpuse I've set it
>> to 10000 and after approx 45s the node was marked as dead.
>>
>> Any chance to get this into a documented preference so possible behavior
>> changes with future releases can be spotted before staging area.
>>
>> cheers,
>> Frank
>>
>
>

Re: Time until a datanode is marked as dead

Posted by Frank Lanitz <fr...@sql-ag.de>.

Am 26.01.2015 um 10:46 schrieb Azuryy Yu:
> can you file an issue to add this configuration to the hdfs-default.xml?

Done with
https://issues.apache.org/jira/browse/HDFS-7685

Cheers,
Frank

Re: Time until a datanode is marked as dead

Posted by Frank Lanitz <fr...@sql-ag.de>.

Am 26.01.2015 um 10:46 schrieb Azuryy Yu:
> can you file an issue to add this configuration to the hdfs-default.xml?

Done with
https://issues.apache.org/jira/browse/HDFS-7685

Cheers,
Frank

Re: Time until a datanode is marked as dead

Posted by Frank Lanitz <fr...@sql-ag.de>.

Am 26.01.2015 um 10:46 schrieb Azuryy Yu:
> can you file an issue to add this configuration to the hdfs-default.xml?

Done with
https://issues.apache.org/jira/browse/HDFS-7685

Cheers,
Frank

Re: Time until a datanode is marked as dead

Posted by Nicolas Liochon <nk...@gmail.com>.

Note that there is a difference between being dead and being stale. stale
means "avoid as much as possible" while dead means "avoid absolutely AND
initiate a recovery, i.e. copy all the data (typically 1 or more Tb)"

There is some info on this blog entry:
http://hortonworks.com/blog/introduction-to-hbase-mean-time-to-recover-mttr/

Cheers,

Nicolas


On Mon, Jan 26, 2015 at 10:46 AM, Azuryy Yu <az...@gmail.com> wrote:

> Hi Frank,
>
> can you file an issue to add this configuration to the hdfs-default.xml?
>
> On Mon, Jan 26, 2015 at 5:39 PM, Frank Lanitz <fr...@sql-ag.de>
> wrote:
>
>> Hi,
>>
>> Am 23.01.2015 um 19:23 schrieb Chris Nauroth:
>> > The time period for determining if a datanode is dead is calculated as a
>> > function of a few different configuration properties.  The current
>> > implementation in DatanodeManager.java does it like this:
>> >
>> >     final long heartbeatIntervalSeconds = conf.getLong(
>> >         DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY,
>> >         DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_DEFAULT);
>> >     final int heartbeatRecheckInterval = conf.getInt(
>> >         DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_KEY,
>> >         DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_DEFAULT);
>> > // 5 minutes
>> >     this.heartbeatExpireInterval = 2 * heartbeatRecheckInterval
>> >         + 10 * 1000 * heartbeatIntervalSeconds;
>>
>>
>> Good to know.
>>
>> > Under default configuration, dfs.namenode.heartbeat.recheck-interval is
>> > 5 minutes and dfs.heartbeat.interval is 3 seconds.  If we plug those
>> > values into the formula, we get 10.5 minutes, which agrees with your
>> > observation.  If you change dfs.namenode.heartbeat.recheck-interval to
>> > 2.5 minutes, then you'll achieve an effective timeout of 5.5 minutes
>> > before a datanode is marked dead.
>> >
>> > dfs.namenode.heartbeat.recheck-interval is not documented in
>> > hdfs-default.xml, though I don't recall if that's an intentional choice
>> > or just an oversight.  The value of the property must be expressed in
>> > milliseconds.
>>
>> This did the trick. Thank you very much. For testing porpuse I've set it
>> to 10000 and after approx 45s the node was marked as dead.
>>
>> Any chance to get this into a documented preference so possible behavior
>> changes with future releases can be spotted before staging area.
>>
>> cheers,
>> Frank
>>
>
>

Re: Time until a datanode is marked as dead

Posted by Nicolas Liochon <nk...@gmail.com>.

Note that there is a difference between being dead and being stale. stale
means "avoid as much as possible" while dead means "avoid absolutely AND
initiate a recovery, i.e. copy all the data (typically 1 or more Tb)"

There is some info on this blog entry:
http://hortonworks.com/blog/introduction-to-hbase-mean-time-to-recover-mttr/

Cheers,

Nicolas


On Mon, Jan 26, 2015 at 10:46 AM, Azuryy Yu <az...@gmail.com> wrote:

> Hi Frank,
>
> can you file an issue to add this configuration to the hdfs-default.xml?
>
> On Mon, Jan 26, 2015 at 5:39 PM, Frank Lanitz <fr...@sql-ag.de>
> wrote:
>
>> Hi,
>>
>> Am 23.01.2015 um 19:23 schrieb Chris Nauroth:
>> > The time period for determining if a datanode is dead is calculated as a
>> > function of a few different configuration properties.  The current
>> > implementation in DatanodeManager.java does it like this:
>> >
>> >     final long heartbeatIntervalSeconds = conf.getLong(
>> >         DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY,
>> >         DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_DEFAULT);
>> >     final int heartbeatRecheckInterval = conf.getInt(
>> >         DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_KEY,
>> >         DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_DEFAULT);
>> > // 5 minutes
>> >     this.heartbeatExpireInterval = 2 * heartbeatRecheckInterval
>> >         + 10 * 1000 * heartbeatIntervalSeconds;
>>
>>
>> Good to know.
>>
>> > Under default configuration, dfs.namenode.heartbeat.recheck-interval is
>> > 5 minutes and dfs.heartbeat.interval is 3 seconds.  If we plug those
>> > values into the formula, we get 10.5 minutes, which agrees with your
>> > observation.  If you change dfs.namenode.heartbeat.recheck-interval to
>> > 2.5 minutes, then you'll achieve an effective timeout of 5.5 minutes
>> > before a datanode is marked dead.
>> >
>> > dfs.namenode.heartbeat.recheck-interval is not documented in
>> > hdfs-default.xml, though I don't recall if that's an intentional choice
>> > or just an oversight.  The value of the property must be expressed in
>> > milliseconds.
>>
>> This did the trick. Thank you very much. For testing porpuse I've set it
>> to 10000 and after approx 45s the node was marked as dead.
>>
>> Any chance to get this into a documented preference so possible behavior
>> changes with future releases can be spotted before staging area.
>>
>> cheers,
>> Frank
>>
>
>

Re: Time until a datanode is marked as dead

Posted by Nicolas Liochon <nk...@gmail.com>.

Note that there is a difference between being dead and being stale. stale
means "avoid as much as possible" while dead means "avoid absolutely AND
initiate a recovery, i.e. copy all the data (typically 1 or more Tb)"

There is some info on this blog entry:
http://hortonworks.com/blog/introduction-to-hbase-mean-time-to-recover-mttr/

Cheers,

Nicolas


On Mon, Jan 26, 2015 at 10:46 AM, Azuryy Yu <az...@gmail.com> wrote:

> Hi Frank,
>
> can you file an issue to add this configuration to the hdfs-default.xml?
>
> On Mon, Jan 26, 2015 at 5:39 PM, Frank Lanitz <fr...@sql-ag.de>
> wrote:
>
>> Hi,
>>
>> Am 23.01.2015 um 19:23 schrieb Chris Nauroth:
>> > The time period for determining if a datanode is dead is calculated as a
>> > function of a few different configuration properties.  The current
>> > implementation in DatanodeManager.java does it like this:
>> >
>> >     final long heartbeatIntervalSeconds = conf.getLong(
>> >         DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY,
>> >         DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_DEFAULT);
>> >     final int heartbeatRecheckInterval = conf.getInt(
>> >         DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_KEY,
>> >         DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_DEFAULT);
>> > // 5 minutes
>> >     this.heartbeatExpireInterval = 2 * heartbeatRecheckInterval
>> >         + 10 * 1000 * heartbeatIntervalSeconds;
>>
>>
>> Good to know.
>>
>> > Under default configuration, dfs.namenode.heartbeat.recheck-interval is
>> > 5 minutes and dfs.heartbeat.interval is 3 seconds.  If we plug those
>> > values into the formula, we get 10.5 minutes, which agrees with your
>> > observation.  If you change dfs.namenode.heartbeat.recheck-interval to
>> > 2.5 minutes, then you'll achieve an effective timeout of 5.5 minutes
>> > before a datanode is marked dead.
>> >
>> > dfs.namenode.heartbeat.recheck-interval is not documented in
>> > hdfs-default.xml, though I don't recall if that's an intentional choice
>> > or just an oversight.  The value of the property must be expressed in
>> > milliseconds.
>>
>> This did the trick. Thank you very much. For testing porpuse I've set it
>> to 10000 and after approx 45s the node was marked as dead.
>>
>> Any chance to get this into a documented preference so possible behavior
>> changes with future releases can be spotted before staging area.
>>
>> cheers,
>> Frank
>>
>
>

Re: Time until a datanode is marked as dead

Posted by Frank Lanitz <fr...@sql-ag.de>.

Am 26.01.2015 um 10:46 schrieb Azuryy Yu:
> can you file an issue to add this configuration to the hdfs-default.xml?

Done with
https://issues.apache.org/jira/browse/HDFS-7685

Cheers,
Frank

Re: Time until a datanode is marked as dead

Posted by Azuryy Yu <az...@gmail.com>.

Hi Frank,

can you file an issue to add this configuration to the hdfs-default.xml?

On Mon, Jan 26, 2015 at 5:39 PM, Frank Lanitz <fr...@sql-ag.de>
wrote:

> Hi,
>
> Am 23.01.2015 um 19:23 schrieb Chris Nauroth:
> > The time period for determining if a datanode is dead is calculated as a
> > function of a few different configuration properties.  The current
> > implementation in DatanodeManager.java does it like this:
> >
> >     final long heartbeatIntervalSeconds = conf.getLong(
> >         DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY,
> >         DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_DEFAULT);
> >     final int heartbeatRecheckInterval = conf.getInt(
> >         DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_KEY,
> >         DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_DEFAULT);
> > // 5 minutes
> >     this.heartbeatExpireInterval = 2 * heartbeatRecheckInterval
> >         + 10 * 1000 * heartbeatIntervalSeconds;
>
>
> Good to know.
>
> > Under default configuration, dfs.namenode.heartbeat.recheck-interval is
> > 5 minutes and dfs.heartbeat.interval is 3 seconds.  If we plug those
> > values into the formula, we get 10.5 minutes, which agrees with your
> > observation.  If you change dfs.namenode.heartbeat.recheck-interval to
> > 2.5 minutes, then you'll achieve an effective timeout of 5.5 minutes
> > before a datanode is marked dead.
> >
> > dfs.namenode.heartbeat.recheck-interval is not documented in
> > hdfs-default.xml, though I don't recall if that's an intentional choice
> > or just an oversight.  The value of the property must be expressed in
> > milliseconds.
>
> This did the trick. Thank you very much. For testing porpuse I've set it
> to 10000 and after approx 45s the node was marked as dead.
>
> Any chance to get this into a documented preference so possible behavior
> changes with future releases can be spotted before staging area.
>
> cheers,
> Frank
>

Re: Time until a datanode is marked as dead

Posted by Azuryy Yu <az...@gmail.com>.

Hi Frank,

can you file an issue to add this configuration to the hdfs-default.xml?

On Mon, Jan 26, 2015 at 5:39 PM, Frank Lanitz <fr...@sql-ag.de>
wrote:

> Hi,
>
> Am 23.01.2015 um 19:23 schrieb Chris Nauroth:
> > The time period for determining if a datanode is dead is calculated as a
> > function of a few different configuration properties.  The current
> > implementation in DatanodeManager.java does it like this:
> >
> >     final long heartbeatIntervalSeconds = conf.getLong(
> >         DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY,
> >         DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_DEFAULT);
> >     final int heartbeatRecheckInterval = conf.getInt(
> >         DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_KEY,
> >         DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_DEFAULT);
> > // 5 minutes
> >     this.heartbeatExpireInterval = 2 * heartbeatRecheckInterval
> >         + 10 * 1000 * heartbeatIntervalSeconds;
>
>
> Good to know.
>
> > Under default configuration, dfs.namenode.heartbeat.recheck-interval is
> > 5 minutes and dfs.heartbeat.interval is 3 seconds.  If we plug those
> > values into the formula, we get 10.5 minutes, which agrees with your
> > observation.  If you change dfs.namenode.heartbeat.recheck-interval to
> > 2.5 minutes, then you'll achieve an effective timeout of 5.5 minutes
> > before a datanode is marked dead.
> >
> > dfs.namenode.heartbeat.recheck-interval is not documented in
> > hdfs-default.xml, though I don't recall if that's an intentional choice
> > or just an oversight.  The value of the property must be expressed in
> > milliseconds.
>
> This did the trick. Thank you very much. For testing porpuse I've set it
> to 10000 and after approx 45s the node was marked as dead.
>
> Any chance to get this into a documented preference so possible behavior
> changes with future releases can be spotted before staging area.
>
> cheers,
> Frank
>

Re: Time until a datanode is marked as dead

Posted by Azuryy Yu <az...@gmail.com>.

Hi Frank,

can you file an issue to add this configuration to the hdfs-default.xml?

On Mon, Jan 26, 2015 at 5:39 PM, Frank Lanitz <fr...@sql-ag.de>
wrote:

> Hi,
>
> Am 23.01.2015 um 19:23 schrieb Chris Nauroth:
> > The time period for determining if a datanode is dead is calculated as a
> > function of a few different configuration properties.  The current
> > implementation in DatanodeManager.java does it like this:
> >
> >     final long heartbeatIntervalSeconds = conf.getLong(
> >         DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY,
> >         DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_DEFAULT);
> >     final int heartbeatRecheckInterval = conf.getInt(
> >         DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_KEY,
> >         DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_DEFAULT);
> > // 5 minutes
> >     this.heartbeatExpireInterval = 2 * heartbeatRecheckInterval
> >         + 10 * 1000 * heartbeatIntervalSeconds;
>
>
> Good to know.
>
> > Under default configuration, dfs.namenode.heartbeat.recheck-interval is
> > 5 minutes and dfs.heartbeat.interval is 3 seconds.  If we plug those
> > values into the formula, we get 10.5 minutes, which agrees with your
> > observation.  If you change dfs.namenode.heartbeat.recheck-interval to
> > 2.5 minutes, then you'll achieve an effective timeout of 5.5 minutes
> > before a datanode is marked dead.
> >
> > dfs.namenode.heartbeat.recheck-interval is not documented in
> > hdfs-default.xml, though I don't recall if that's an intentional choice
> > or just an oversight.  The value of the property must be expressed in
> > milliseconds.
>
> This did the trick. Thank you very much. For testing porpuse I've set it
> to 10000 and after approx 45s the node was marked as dead.
>
> Any chance to get this into a documented preference so possible behavior
> changes with future releases can be spotted before staging area.
>
> cheers,
> Frank
>

Re: Time until a datanode is marked as dead

Posted by Azuryy Yu <az...@gmail.com>.

Hi Frank,

can you file an issue to add this configuration to the hdfs-default.xml?

On Mon, Jan 26, 2015 at 5:39 PM, Frank Lanitz <fr...@sql-ag.de>
wrote:

> Hi,
>
> Am 23.01.2015 um 19:23 schrieb Chris Nauroth:
> > The time period for determining if a datanode is dead is calculated as a
> > function of a few different configuration properties.  The current
> > implementation in DatanodeManager.java does it like this:
> >
> >     final long heartbeatIntervalSeconds = conf.getLong(
> >         DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY,
> >         DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_DEFAULT);
> >     final int heartbeatRecheckInterval = conf.getInt(
> >         DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_KEY,
> >         DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_DEFAULT);
> > // 5 minutes
> >     this.heartbeatExpireInterval = 2 * heartbeatRecheckInterval
> >         + 10 * 1000 * heartbeatIntervalSeconds;
>
>
> Good to know.
>
> > Under default configuration, dfs.namenode.heartbeat.recheck-interval is
> > 5 minutes and dfs.heartbeat.interval is 3 seconds.  If we plug those
> > values into the formula, we get 10.5 minutes, which agrees with your
> > observation.  If you change dfs.namenode.heartbeat.recheck-interval to
> > 2.5 minutes, then you'll achieve an effective timeout of 5.5 minutes
> > before a datanode is marked dead.
> >
> > dfs.namenode.heartbeat.recheck-interval is not documented in
> > hdfs-default.xml, though I don't recall if that's an intentional choice
> > or just an oversight.  The value of the property must be expressed in
> > milliseconds.
>
> This did the trick. Thank you very much. For testing porpuse I've set it
> to 10000 and after approx 45s the node was marked as dead.
>
> Any chance to get this into a documented preference so possible behavior
> changes with future releases can be spotted before staging area.
>
> cheers,
> Frank
>

Re: Time until a datanode is marked as dead

Posted by Frank Lanitz <fr...@sql-ag.de>.

Hi,

Am 23.01.2015 um 19:23 schrieb Chris Nauroth:
> The time period for determining if a datanode is dead is calculated as a
> function of a few different configuration properties.  The current
> implementation in DatanodeManager.java does it like this:
> 
>     final long heartbeatIntervalSeconds = conf.getLong(
>         DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY,
>         DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_DEFAULT);
>     final int heartbeatRecheckInterval = conf.getInt(
>         DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_KEY, 
>         DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_DEFAULT);
> // 5 minutes
>     this.heartbeatExpireInterval = 2 * heartbeatRecheckInterval
>         + 10 * 1000 * heartbeatIntervalSeconds;


Good to know.

> Under default configuration, dfs.namenode.heartbeat.recheck-interval is
> 5 minutes and dfs.heartbeat.interval is 3 seconds.  If we plug those
> values into the formula, we get 10.5 minutes, which agrees with your
> observation.  If you change dfs.namenode.heartbeat.recheck-interval to
> 2.5 minutes, then you'll achieve an effective timeout of 5.5 minutes
> before a datanode is marked dead.
> 
> dfs.namenode.heartbeat.recheck-interval is not documented in
> hdfs-default.xml, though I don't recall if that's an intentional choice
> or just an oversight.  The value of the property must be expressed in
> milliseconds.

This did the trick. Thank you very much. For testing porpuse I've set it
to 10000 and after approx 45s the node was marked as dead.

Any chance to get this into a documented preference so possible behavior
changes with future releases can be spotted before staging area.

cheers,
Frank

Re: Time until a datanode is marked as dead

Posted by Frank Lanitz <fr...@sql-ag.de>.

Hi,

Am 23.01.2015 um 19:23 schrieb Chris Nauroth:
> The time period for determining if a datanode is dead is calculated as a
> function of a few different configuration properties.  The current
> implementation in DatanodeManager.java does it like this:
> 
>     final long heartbeatIntervalSeconds = conf.getLong(
>         DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY,
>         DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_DEFAULT);
>     final int heartbeatRecheckInterval = conf.getInt(
>         DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_KEY, 
>         DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_DEFAULT);
> // 5 minutes
>     this.heartbeatExpireInterval = 2 * heartbeatRecheckInterval
>         + 10 * 1000 * heartbeatIntervalSeconds;


Good to know.

> Under default configuration, dfs.namenode.heartbeat.recheck-interval is
> 5 minutes and dfs.heartbeat.interval is 3 seconds.  If we plug those
> values into the formula, we get 10.5 minutes, which agrees with your
> observation.  If you change dfs.namenode.heartbeat.recheck-interval to
> 2.5 minutes, then you'll achieve an effective timeout of 5.5 minutes
> before a datanode is marked dead.
> 
> dfs.namenode.heartbeat.recheck-interval is not documented in
> hdfs-default.xml, though I don't recall if that's an intentional choice
> or just an oversight.  The value of the property must be expressed in
> milliseconds.

This did the trick. Thank you very much. For testing porpuse I've set it
to 10000 and after approx 45s the node was marked as dead.

Any chance to get this into a documented preference so possible behavior
changes with future releases can be spotted before staging area.

cheers,
Frank

Re: Time until a datanode is marked as dead

Posted by Frank Lanitz <fr...@sql-ag.de>.

Hi,

Am 23.01.2015 um 19:23 schrieb Chris Nauroth:
> The time period for determining if a datanode is dead is calculated as a
> function of a few different configuration properties.  The current
> implementation in DatanodeManager.java does it like this:
> 
>     final long heartbeatIntervalSeconds = conf.getLong(
>         DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY,
>         DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_DEFAULT);
>     final int heartbeatRecheckInterval = conf.getInt(
>         DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_KEY, 
>         DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_DEFAULT);
> // 5 minutes
>     this.heartbeatExpireInterval = 2 * heartbeatRecheckInterval
>         + 10 * 1000 * heartbeatIntervalSeconds;


Good to know.

> Under default configuration, dfs.namenode.heartbeat.recheck-interval is
> 5 minutes and dfs.heartbeat.interval is 3 seconds.  If we plug those
> values into the formula, we get 10.5 minutes, which agrees with your
> observation.  If you change dfs.namenode.heartbeat.recheck-interval to
> 2.5 minutes, then you'll achieve an effective timeout of 5.5 minutes
> before a datanode is marked dead.
> 
> dfs.namenode.heartbeat.recheck-interval is not documented in
> hdfs-default.xml, though I don't recall if that's an intentional choice
> or just an oversight.  The value of the property must be expressed in
> milliseconds.

This did the trick. Thank you very much. For testing porpuse I've set it
to 10000 and after approx 45s the node was marked as dead.

Any chance to get this into a documented preference so possible behavior
changes with future releases can be spotted before staging area.

cheers,
Frank

Re: Time until a datanode is marked as dead

Posted by Frank Lanitz <fr...@sql-ag.de>.

Hi,

Am 23.01.2015 um 19:23 schrieb Chris Nauroth:
> The time period for determining if a datanode is dead is calculated as a
> function of a few different configuration properties.  The current
> implementation in DatanodeManager.java does it like this:
> 
>     final long heartbeatIntervalSeconds = conf.getLong(
>         DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY,
>         DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_DEFAULT);
>     final int heartbeatRecheckInterval = conf.getInt(
>         DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_KEY, 
>         DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_DEFAULT);
> // 5 minutes
>     this.heartbeatExpireInterval = 2 * heartbeatRecheckInterval
>         + 10 * 1000 * heartbeatIntervalSeconds;


Good to know.

> Under default configuration, dfs.namenode.heartbeat.recheck-interval is
> 5 minutes and dfs.heartbeat.interval is 3 seconds.  If we plug those
> values into the formula, we get 10.5 minutes, which agrees with your
> observation.  If you change dfs.namenode.heartbeat.recheck-interval to
> 2.5 minutes, then you'll achieve an effective timeout of 5.5 minutes
> before a datanode is marked dead.
> 
> dfs.namenode.heartbeat.recheck-interval is not documented in
> hdfs-default.xml, though I don't recall if that's an intentional choice
> or just an oversight.  The value of the property must be expressed in
> milliseconds.

This did the trick. Thank you very much. For testing porpuse I've set it
to 10000 and after approx 45s the node was marked as dead.

Any chance to get this into a documented preference so possible behavior
changes with future releases can be spotted before staging area.

cheers,
Frank

Re: Time until a datanode is marked as dead

Posted by Chris Nauroth <cn...@hortonworks.com>.

Hi Frank,

The time period for determining if a datanode is dead is calculated as a
function of a few different configuration properties.  The current
implementation in DatanodeManager.java does it like this:

    final long heartbeatIntervalSeconds = conf.getLong(
        DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY,
        DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_DEFAULT);
    final int heartbeatRecheckInterval = conf.getInt(
        DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_KEY,
        DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_DEFAULT); //
5 minutes
    this.heartbeatExpireInterval = 2 * heartbeatRecheckInterval
        + 10 * 1000 * heartbeatIntervalSeconds;

Under default configuration, dfs.namenode.heartbeat.recheck-interval is 5
minutes and dfs.heartbeat.interval is 3 seconds.  If we plug those values
into the formula, we get 10.5 minutes, which agrees with your observation.
If you change dfs.namenode.heartbeat.recheck-interval to 2.5 minutes, then
you'll achieve an effective timeout of 5.5 minutes before a datanode is
marked dead.

dfs.namenode.heartbeat.recheck-interval is not documented in
hdfs-default.xml, though I don't recall if that's an intentional choice or
just an oversight.  The value of the property must be expressed in
milliseconds.

Chris Nauroth
Hortonworks
http://hortonworks.com/

On Thu, Jan 22, 2015 at 11:19 PM, Frank Lanitz <fr...@sql-ag.de>
wrote:

> Hi,
>
> I'm trying to configure the time a datanode needs to be considered dead.
> Currently it appears to be set to something about 10min which is a
> little to high for my scenario. As I wasn't able to find some obvious
> flag, I've tried to set some properties, which might could do that.
> Without succes. So e.g. I've put into my hdfs-site.xml
>
> <property>
>     <name>dfs.namenode.check.stale.datanode</name>
>     <value>true</value>
>     <description>Activate stale check</description>
> </property>
>
> <property>
>     <name>dfs.namenode.stale.datanode.interval</name>
>     <value>10</value>
>     <description>Timeout</description>
> </property>
>
> So my question is: Which option(s) I have to set in order to e.g.
> decrease time needed to mark a datanode as dead to 5min running 2.6.
>
> Cheers,
> Frank
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Time until a datanode is marked as dead

Posted by Chris Nauroth <cn...@hortonworks.com>.

Hi Frank,

The time period for determining if a datanode is dead is calculated as a
function of a few different configuration properties.  The current
implementation in DatanodeManager.java does it like this:

    final long heartbeatIntervalSeconds = conf.getLong(
        DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY,
        DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_DEFAULT);
    final int heartbeatRecheckInterval = conf.getInt(
        DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_KEY,
        DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_DEFAULT); //
5 minutes
    this.heartbeatExpireInterval = 2 * heartbeatRecheckInterval
        + 10 * 1000 * heartbeatIntervalSeconds;

Under default configuration, dfs.namenode.heartbeat.recheck-interval is 5
minutes and dfs.heartbeat.interval is 3 seconds.  If we plug those values
into the formula, we get 10.5 minutes, which agrees with your observation.
If you change dfs.namenode.heartbeat.recheck-interval to 2.5 minutes, then
you'll achieve an effective timeout of 5.5 minutes before a datanode is
marked dead.

dfs.namenode.heartbeat.recheck-interval is not documented in
hdfs-default.xml, though I don't recall if that's an intentional choice or
just an oversight.  The value of the property must be expressed in
milliseconds.

Chris Nauroth
Hortonworks
http://hortonworks.com/

On Thu, Jan 22, 2015 at 11:19 PM, Frank Lanitz <fr...@sql-ag.de>
wrote:

> Hi,
>
> I'm trying to configure the time a datanode needs to be considered dead.
> Currently it appears to be set to something about 10min which is a
> little to high for my scenario. As I wasn't able to find some obvious
> flag, I've tried to set some properties, which might could do that.
> Without succes. So e.g. I've put into my hdfs-site.xml
>
> <property>
>     <name>dfs.namenode.check.stale.datanode</name>
>     <value>true</value>
>     <description>Activate stale check</description>
> </property>
>
> <property>
>     <name>dfs.namenode.stale.datanode.interval</name>
>     <value>10</value>
>     <description>Timeout</description>
> </property>
>
> So my question is: Which option(s) I have to set in order to e.g.
> decrease time needed to mark a datanode as dead to 5min running 2.6.
>
> Cheers,
> Frank
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Time until a datanode is marked as dead

Posted by Chris Nauroth <cn...@hortonworks.com>.

Hi Frank,

The time period for determining if a datanode is dead is calculated as a
function of a few different configuration properties.  The current
implementation in DatanodeManager.java does it like this:

    final long heartbeatIntervalSeconds = conf.getLong(
        DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY,
        DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_DEFAULT);
    final int heartbeatRecheckInterval = conf.getInt(
        DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_KEY,
        DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_DEFAULT); //
5 minutes
    this.heartbeatExpireInterval = 2 * heartbeatRecheckInterval
        + 10 * 1000 * heartbeatIntervalSeconds;

Under default configuration, dfs.namenode.heartbeat.recheck-interval is 5
minutes and dfs.heartbeat.interval is 3 seconds.  If we plug those values
into the formula, we get 10.5 minutes, which agrees with your observation.
If you change dfs.namenode.heartbeat.recheck-interval to 2.5 minutes, then
you'll achieve an effective timeout of 5.5 minutes before a datanode is
marked dead.

dfs.namenode.heartbeat.recheck-interval is not documented in
hdfs-default.xml, though I don't recall if that's an intentional choice or
just an oversight.  The value of the property must be expressed in
milliseconds.

Chris Nauroth
Hortonworks
http://hortonworks.com/

On Thu, Jan 22, 2015 at 11:19 PM, Frank Lanitz <fr...@sql-ag.de>
wrote:

> Hi,
>
> I'm trying to configure the time a datanode needs to be considered dead.
> Currently it appears to be set to something about 10min which is a
> little to high for my scenario. As I wasn't able to find some obvious
> flag, I've tried to set some properties, which might could do that.
> Without succes. So e.g. I've put into my hdfs-site.xml
>
> <property>
>     <name>dfs.namenode.check.stale.datanode</name>
>     <value>true</value>
>     <description>Activate stale check</description>
> </property>
>
> <property>
>     <name>dfs.namenode.stale.datanode.interval</name>
>     <value>10</value>
>     <description>Timeout</description>
> </property>
>
> So my question is: Which option(s) I have to set in order to e.g.
> decrease time needed to mark a datanode as dead to 5min running 2.6.
>
> Cheers,
> Frank
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Time until a datanode is marked as dead

Posted by Chris Nauroth <cn...@hortonworks.com>.

Hi Frank,

The time period for determining if a datanode is dead is calculated as a
function of a few different configuration properties.  The current
implementation in DatanodeManager.java does it like this:

    final long heartbeatIntervalSeconds = conf.getLong(
        DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY,
        DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_DEFAULT);
    final int heartbeatRecheckInterval = conf.getInt(
        DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_KEY,
        DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_DEFAULT); //
5 minutes
    this.heartbeatExpireInterval = 2 * heartbeatRecheckInterval
        + 10 * 1000 * heartbeatIntervalSeconds;

Under default configuration, dfs.namenode.heartbeat.recheck-interval is 5
minutes and dfs.heartbeat.interval is 3 seconds.  If we plug those values
into the formula, we get 10.5 minutes, which agrees with your observation.
If you change dfs.namenode.heartbeat.recheck-interval to 2.5 minutes, then
you'll achieve an effective timeout of 5.5 minutes before a datanode is
marked dead.

dfs.namenode.heartbeat.recheck-interval is not documented in
hdfs-default.xml, though I don't recall if that's an intentional choice or
just an oversight.  The value of the property must be expressed in
milliseconds.

Chris Nauroth
Hortonworks
http://hortonworks.com/

On Thu, Jan 22, 2015 at 11:19 PM, Frank Lanitz <fr...@sql-ag.de>
wrote:

> Hi,
>
> I'm trying to configure the time a datanode needs to be considered dead.
> Currently it appears to be set to something about 10min which is a
> little to high for my scenario. As I wasn't able to find some obvious
> flag, I've tried to set some properties, which might could do that.
> Without succes. So e.g. I've put into my hdfs-site.xml
>
> <property>
>     <name>dfs.namenode.check.stale.datanode</name>
>     <value>true</value>
>     <description>Activate stale check</description>
> </property>
>
> <property>
>     <name>dfs.namenode.stale.datanode.interval</name>
>     <value>10</value>
>     <description>Timeout</description>
> </property>
>
> So my question is: Which option(s) I have to set in order to e.g.
> decrease time needed to mark a datanode as dead to 5min running 2.6.
>
> Cheers,
> Frank
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.