You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Dejan Menges <de...@gmail.com> on 2015/06/10 13:22:15 UTC

When is DataNode 'bad'?

Hi,

>From time to time I see some reduces failing with this:

Error: java.io.IOException: Failed to replace a bad datanode on the
existing pipeline due to no more good datanodes being available to try. The
current failed datanode replacement policy is DEFAULT, and a client may
configure this via
'dfs.client.block.write.replace-datanode-on-failure.policy' in its
configuration.

I don't see any issues in HDFS during this period (for example, for
specific node on which this happened, I checked the logs, and only thing
that was happening at that specific point was that pipeline was
recovering).

So not quite sure how there's no more good datanodes in cluster of 15 nodes
with replication factor three?

Also, regarding
http://blog.cloudera.com/blog/2015/03/understanding-hdfs-recovery-processes-part-2/
- there is parameter called dfs.client.block.write.replace-datanode-on-
failure.best-effort which I can not find currently. From which Hadoop
version this parameter can be used, and how much sense it makes to use it
to avoid issues like this one from above?

It's about Hadoop 2.4, Hortonworks 2.1, and currently preparing upgrade to
2.2 and not sure if this is maybe some known issue or something I don't get.

Thanks a lot,
Dejan