You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by "hehaoreset@gmail.com" <he...@gmail.com> on 2022/10/21 09:22:01 UTC

HDFS DataNode unavailable

I have an HDFS cluster, version 2.7.2, with two namenodes and three datanodes.
While uploads the file, an exception is found: java.io.IOException: Got
error,status message,ack with firstBadLink as X:50010.

I noticed that the datanode log is stopped, only datanode.log.1, not
datanode.log. But the rest of the process logs are normal. The HDFS log
directory is out of space. I did nothing but restart all the datanodes, and
HDFS was back to normal.

What's the reason?

从 Windows 版[邮件](https://go.microsoft.com/fwlink/?LinkId=550986)发送



\--------------------------------------------------------------------- To
unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org For additional
commands, e-mail: user-help@hadoop.apache.org


回复: HDFS DataNode unavailable

Posted by "hehaoreset@gmail.com" <he...@gmail.com>.
Hello Chris Nauroth,



Thank you for your advice. I just saw your email. I will confirm the last
information in the log. I will be thinking about upgrading the cluster in the
near future.



Thank you very much.



He Hao







从 Windows 版[邮件](https://go.microsoft.com/fwlink/?LinkId=550986)发送



 **发件人 : **[Chris Nauroth](mailto:cnauroth@apache.org)  
 **发送时间 : **2022年10月26日 0:41  
 **收件人 : **[hehaoreset@gmail.com](mailto:hehaoreset@gmail.com)  
 **抄送 : **[user@hadoop.apache.org](mailto:user@hadoop.apache.org)  
 **主题 : **Re: HDFS DataNode unavailable



Hello,



I think broadly there could be 2 potential root cause explanations:



1\. Logs are routed to a volume that is too small to hold the expected
logging. You can review configuration settings in log4j.properties related to
the rolling file appender. This determines how large logs can get and how many
of the old rolled files to retain. If the maximum would exceed the capacity on
the volume holding these logs, then you either need to configure smaller
retention or redirect the logs to a larger volume.



2\. Some error condition caused abnormal log spam. If the log isn't there
anymore, then it's difficult to say what this could have been specifically.
You could keep an eye on logs for the next few days after the restart to see
if there are a lot of unexpected errors.



On a separate note, version 2.7.2 is quite old, released in 2017. It's missing
numerous bug fixes and security patches. I recommend looking into an upgrade
to 2.10.2 in the short term, followed by a plan for getting onto a currently
supported 3.x release.



I hope this helps.

  

Chris Nauroth





On Mon, Oct 24, 2022 at 11:31 PM
[hehaoreset@gmail.com](mailto:hehaoreset@gmail.com)
<[hehaoreset@gmail.com](mailto:hehaoreset@gmail.com)> wrote:

> I have an HDFS cluster, version 2.7.2, with two namenodes and three
> datanodes. While uploads the file, an exception is found:
> java.io.IOException: Got error,status message,ack with firstBadLink as
> X:50010.
>
> I noticed that the datanode log is stopped, only datanode.log.1, not
> datanode.log. But the rest of the process logs are normal. The HDFS log
> directory is out of space. I did nothing but restart all the datanodes, and
> HDFS was back to normal.
>
> What's the reason?
>
> 从 Windows 版[邮件](https://go.microsoft.com/fwlink/?LinkId=550986)发送
>
>  

\--------------------------------------------------------------------- To
unsubscribe, e-mail: [user-unsubscribe@hadoop.apache.org](mailto:user-
unsubscribe@hadoop.apache.org) For additional commands, e-mail: [user-
help@hadoop.apache.org](mailto:user-help@hadoop.apache.org)



\--------------------------------------------------------------------- To
unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org For additional
commands, e-mail: user-help@hadoop.apache.org


Re: HDFS DataNode unavailable

Posted by Chris Nauroth <cn...@apache.org>.
Hello,

I think broadly there could be 2 potential root cause explanations:

1. Logs are routed to a volume that is too small to hold the expected
logging. You can review configuration settings in log4j.properties related
to the rolling file appender. This determines how large logs can get and
how many of the old rolled files to retain. If the maximum would exceed the
capacity on the volume holding these logs, then you either need to
configure smaller retention or redirect the logs to a larger volume.

2. Some error condition caused abnormal log spam. If the log isn't there
anymore, then it's difficult to say what this could have been specifically.
You could keep an eye on logs for the next few days after the restart to
see if there are a lot of unexpected errors.

On a separate note, version 2.7.2 is quite old, released in 2017. It's
missing numerous bug fixes and security patches. I recommend looking into
an upgrade to 2.10.2 in the short term, followed by a plan for getting onto
a currently supported 3.x release.

I hope this helps.

Chris Nauroth


On Mon, Oct 24, 2022 at 11:31 PM hehaoreset@gmail.com <he...@gmail.com>
wrote:

> I have an HDFS cluster, version 2.7.2, with two namenodes and three
> datanodes. While uploads the file, an exception is found:
> java.io.IOException: Got error,status message,ack with firstBadLink as
> X:50010.
>
> I noticed that the datanode log is stopped, only datanode.log.1, not
> datanode.log. But the rest of the process logs are normal. The HDFS log
> directory is out of space. I did nothing but restart all the datanodes, and
> HDFS was back to normal.
>
> What's the reason?
>
> 从 Windows 版邮件 <https://go.microsoft.com/fwlink/?LinkId=550986>发送
>
>
> --------------------------------------------------------------------- To
> unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org For additional
> commands, e-mail: user-help@hadoop.apache.org
>