You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Adam Phelps <am...@opendns.com> on 2011/03/24 18:30:27 UTC

Datanode won't start with bad disk

We have a bad disk on one of our datanode machines, and while we have 
dfs.datanode.failed.volumes.tolerated set to 2 and didn't see any 
problem while the DataNode process was running we are seeing a problem 
when we needed to restart the DataNode process:

2011-03-24 16:50:20,071 WARN org.apache.hadoop.util.DiskChecker: 
Incorrect permissions were set on /var/lib/stats/hdfs/4, expected: 
rwxr-xr-x, while actual: ---------. Fixing...
2011-03-24 16:50:20,089 INFO org.apache.hadoop.util.NativeCodeLoader: 
Loaded the native-hadoop library
2011-03-24 16:50:20,091 ERROR 
org.apache.hadoop.hdfs.server.datanode.DataNode: EPERM: Operation not 
permitted

In this case /var/lib/stats/hdfs/4 is the mount point for the bad disk. 
  It gets that permission error because we have the mount directory set 
to be immutable:

root@s3:/var/log/hadoop# lsattr  /var/lib/stats/hdfs/
------------------- /var/lib/stats/hdfs/2
----i------------e- /var/lib/stats/hdfs/4
------------------- /var/lib/stats/hdfs/3
------------------- /var/lib/stats/hdfs/1

As we'd previously seen HDFS just write to the local disk when a disk 
couldn't be mounted.

HDFS is supposed to be able to handle failed disk, but it doesn't seem 
to be doing the right thing in this case.  Is this a known problem, or 
is there some other way we should be configuring things to allow the 
DataNode to come up in this situation?

(clearly we can remove the mount point from hdfs-site.xml, but that 
doesn't feel like the correct solution)

Thanks
- Adam

Re: Datanode won't start with bad disk

Posted by Allen Wittenauer <aw...@apache.org>.
On Mar 24, 2011, at 10:47 AM, Adam Phelps wrote:

> For reference, this is running hadoop 0.20.2 from the CDH3B4 distribution.

	Given that this isn't a standard Apache release, you'll likely be better served by asking Cloudera.


Re: Datanode won't start with bad disk

Posted by "Aaron T. Myers" <at...@cloudera.com>.
bcc: hdfs-user@hadoop.apache.org
+ cdh-user@cloudera.org

Hey Adam,

Thanks a lot for the bug report. I've added cdh-user@ to this email, which
may be a more appropriate list for this question.

Best,
Aaron

--
Aaron T. Myers
Software Engineer, Cloudera



On Thu, Mar 24, 2011 at 10:47 AM, Adam Phelps <am...@opendns.com> wrote:

> For reference, this is running hadoop 0.20.2 from the CDH3B4 distribution.
>
> - Adam
>
>
> On 3/24/11 10:30 AM, Adam Phelps wrote:
>
>> We have a bad disk on one of our datanode machines, and while we have
>> dfs.datanode.failed.volumes.tolerated set to 2 and didn't see any
>> problem while the DataNode process was running we are seeing a problem
>> when we needed to restart the DataNode process:
>>
>> 2011-03-24 16:50:20,071 WARN org.apache.hadoop.util.DiskChecker:
>> Incorrect permissions were set on /var/lib/stats/hdfs/4, expected:
>> rwxr-xr-x, while actual: ---------. Fixing...
>> 2011-03-24 16:50:20,089 INFO org.apache.hadoop.util.NativeCodeLoader:
>> Loaded the native-hadoop library
>> 2011-03-24 16:50:20,091 ERROR
>> org.apache.hadoop.hdfs.server.datanode.DataNode: EPERM: Operation not
>> permitted
>>
>> In this case /var/lib/stats/hdfs/4 is the mount point for the bad disk.
>> It gets that permission error because we have the mount directory set to
>> be immutable:
>>
>> root@s3:/var/log/hadoop# lsattr /var/lib/stats/hdfs/
>> ------------------- /var/lib/stats/hdfs/2
>> ----i------------e- /var/lib/stats/hdfs/4
>> ------------------- /var/lib/stats/hdfs/3
>> ------------------- /var/lib/stats/hdfs/1
>>
>> As we'd previously seen HDFS just write to the local disk when a disk
>> couldn't be mounted.
>>
>> HDFS is supposed to be able to handle failed disk, but it doesn't seem
>> to be doing the right thing in this case. Is this a known problem, or is
>> there some other way we should be configuring things to allow the
>> DataNode to come up in this situation?
>>
>> (clearly we can remove the mount point from hdfs-site.xml, but that
>> doesn't feel like the correct solution)
>>
>> Thanks
>> - Adam
>>
>>
>

Re: Datanode won't start with bad disk

Posted by Adam Phelps <am...@opendns.com>.
For reference, this is running hadoop 0.20.2 from the CDH3B4 distribution.

- Adam

On 3/24/11 10:30 AM, Adam Phelps wrote:
> We have a bad disk on one of our datanode machines, and while we have
> dfs.datanode.failed.volumes.tolerated set to 2 and didn't see any
> problem while the DataNode process was running we are seeing a problem
> when we needed to restart the DataNode process:
>
> 2011-03-24 16:50:20,071 WARN org.apache.hadoop.util.DiskChecker:
> Incorrect permissions were set on /var/lib/stats/hdfs/4, expected:
> rwxr-xr-x, while actual: ---------. Fixing...
> 2011-03-24 16:50:20,089 INFO org.apache.hadoop.util.NativeCodeLoader:
> Loaded the native-hadoop library
> 2011-03-24 16:50:20,091 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: EPERM: Operation not
> permitted
>
> In this case /var/lib/stats/hdfs/4 is the mount point for the bad disk.
> It gets that permission error because we have the mount directory set to
> be immutable:
>
> root@s3:/var/log/hadoop# lsattr /var/lib/stats/hdfs/
> ------------------- /var/lib/stats/hdfs/2
> ----i------------e- /var/lib/stats/hdfs/4
> ------------------- /var/lib/stats/hdfs/3
> ------------------- /var/lib/stats/hdfs/1
>
> As we'd previously seen HDFS just write to the local disk when a disk
> couldn't be mounted.
>
> HDFS is supposed to be able to handle failed disk, but it doesn't seem
> to be doing the right thing in this case. Is this a known problem, or is
> there some other way we should be configuring things to allow the
> DataNode to come up in this situation?
>
> (clearly we can remove the mount point from hdfs-site.xml, but that
> doesn't feel like the correct solution)
>
> Thanks
> - Adam
>