You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Brian Bockelman <bb...@cse.unl.edu> on 2008/12/17 21:02:32 UTC

Datanode handling of single disk failure

Hello all,

I'd like to take the datanode's capability to handle multiple  
directories to a somewhat-extreme, and get feedback on how well this  
might work.

We have a few large RAID servers (12 to 48 disks) which we'd like to  
transition to Hadoop.  I'd like to mount each of the disks  
individually (i.e., /mnt/disk1, /mnt/disk2, ....) and take advantage  
of Hadoop's replication - instead of pay the overhead to set up a RAID  
and still have to pay the overhead of replication.

However, we're a bit concerned about how well Hadoop might handle one  
of the directories disappearing from underneath it.  If a single  
volume, say, /mnt/disk1 starts returning I/O errors, is Hadoop smart  
enough to figure out that this whole volume is broken?  Or will we  
have to restart the datanode after any disk failure for it to search  
the directory realize everything is broken?  What happens if you start  
up the datanode with a data directory that it can't write into?

Is anyone running in this fashion (i.e., multiple data directories  
corresponding to different disk volumes ... even better if you're  
doing it with more than a few disks)?

Brian

Re: Datanode handling of single disk failure

Posted by Brian Bockelman <bb...@cse.unl.edu>.

Thank you Konstantin, this information will be useful.

Brian

On Dec 19, 2008, at 12:37 PM, Konstantin Shvachko wrote:

>
> Brian Bockelman wrote:
>> Hello all,
>> I'd like to take the datanode's capability to handle multiple  
>> directories to a somewhat-extreme, and get feedback on how well  
>> this might work.
>> We have a few large RAID servers (12 to 48 disks) which we'd like  
>> to transition to Hadoop.  I'd like to mount each of the disks  
>> individually (i.e., /mnt/disk1, /mnt/disk2, ....) and take  
>> advantage of Hadoop's replication - instead of pay the overhead to  
>> set up a RAID and still have to pay the overhead of replication.
>
> In my experience this is the right way to go.
>
>> However, we're a bit concerned about how well Hadoop might handle  
>> one of the directories disappearing from underneath it.  If a  
>> single volume, say, /mnt/disk1 starts returning I/O errors, is  
>> Hadoop smart enough to figure out that this whole volume is  
>> broken?  Or will we have to restart the datanode after any disk  
>> failure for it to search the directory realize everything is  
>> broken?  What happens if you start up the datanode with a data  
>> directory that it can't write into?
>
> In current implementation if at any point Datanode detects an  
> unwritable or
> unreadable drive it shuts itself down logging a message what went  
> wrong and
> reporting the problem to the name-node.
> So yes if such thing happens you will have to restart the data-node.
> But since the cluster takes care of data-node failures by re- 
> replicating
> lost blocks that should not be a problem.
>
>> Is anyone running in this fashion (i.e., multiple data directories  
>> corresponding to different disk volumes ... even better if you're  
>> doing it with more than a few disks)?
>
> We have a large experience running 4 drives per data-node (no RAID).
> So this is not something new or untested.
>
> Thanks,
> --Konstantin

Re: Datanode handling of single disk failure

Posted by Konstantin Shvachko <sh...@yahoo-inc.com>.

Brian Bockelman wrote:
> Hello all,
> 
> I'd like to take the datanode's capability to handle multiple 
> directories to a somewhat-extreme, and get feedback on how well this 
> might work.
> 
> We have a few large RAID servers (12 to 48 disks) which we'd like to 
> transition to Hadoop.  I'd like to mount each of the disks individually 
> (i.e., /mnt/disk1, /mnt/disk2, ....) and take advantage of Hadoop's 
> replication - instead of pay the overhead to set up a RAID and still 
> have to pay the overhead of replication.

In my experience this is the right way to go.

> However, we're a bit concerned about how well Hadoop might handle one of 
> the directories disappearing from underneath it.  If a single volume, 
> say, /mnt/disk1 starts returning I/O errors, is Hadoop smart enough to 
> figure out that this whole volume is broken?  Or will we have to restart 
> the datanode after any disk failure for it to search the directory 
> realize everything is broken?  What happens if you start up the datanode 
> with a data directory that it can't write into?

In current implementation if at any point Datanode detects an unwritable or
unreadable drive it shuts itself down logging a message what went wrong and
reporting the problem to the name-node.
So yes if such thing happens you will have to restart the data-node.
But since the cluster takes care of data-node failures by re-replicating
lost blocks that should not be a problem.

> Is anyone running in this fashion (i.e., multiple data directories 
> corresponding to different disk volumes ... even better if you're doing 
> it with more than a few disks)?

We have a large experience running 4 drives per data-node (no RAID).
So this is not something new or untested.

Thanks,
--Konstantin