You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Jean-Marc Spaggiari <je...@spaggiari.org> on 2012/11/22 21:11:00 UTC

Diskspace usage

Hi,

Quick question on the way hadoop is using the disk space.

Let's say I have 8 nodes. 7 of them with a 2T disk, and one with a 256GB.

Is hadoop going to use the 256GB until it's full, then continue with
the other nodes only but keeping the 256GB live? Or will it bring the
256GB node down when it will be full (like for failures) and continue
with the 7 remaining nodes?

To summarize, is hadoop taking care of the drive size?

Thanks,

JM

Re: Diskspace usage

Posted by Harsh J <ha...@cloudera.com>.
Hi,

HDFS by default writes to disk in round robin fashion, which would
mean your 256 GB will indeed fill up faster than the rest over time.

When a disk is full, the disk is ignored from writes until some block
data is deleted from it (as part of regular HDFS file deletes).
However, blocks existing on the disk are still used for reads as
normal. The disk isn't marked failed or ejected, but just not written
to anymore (i.e. not selected for round robins of block writes) until
it is usable again.

So tl;dr: This is no-worry situation.

However, is this 256 GB a root disk (i.e. OS disk)? We usually
recommend not using the OS disk for DN data storage as there's chance
of fill-ups if misconfigured, that could lead to weird issues with the
OS showing up.

On Fri, Nov 23, 2012 at 1:41 AM, Jean-Marc Spaggiari
<je...@spaggiari.org> wrote:
> Hi,
>
> Quick question on the way hadoop is using the disk space.
>
> Let's say I have 8 nodes. 7 of them with a 2T disk, and one with a 256GB.
>
> Is hadoop going to use the 256GB until it's full, then continue with
> the other nodes only but keeping the 256GB live? Or will it bring the
> 256GB node down when it will be full (like for failures) and continue
> with the 7 remaining nodes?
>
> To summarize, is hadoop taking care of the drive size?
>
> Thanks,
>
> JM



-- 
Harsh J

Re: Diskspace usage

Posted by Harsh J <ha...@cloudera.com>.
Hi,

HDFS by default writes to disk in round robin fashion, which would
mean your 256 GB will indeed fill up faster than the rest over time.

When a disk is full, the disk is ignored from writes until some block
data is deleted from it (as part of regular HDFS file deletes).
However, blocks existing on the disk are still used for reads as
normal. The disk isn't marked failed or ejected, but just not written
to anymore (i.e. not selected for round robins of block writes) until
it is usable again.

So tl;dr: This is no-worry situation.

However, is this 256 GB a root disk (i.e. OS disk)? We usually
recommend not using the OS disk for DN data storage as there's chance
of fill-ups if misconfigured, that could lead to weird issues with the
OS showing up.

On Fri, Nov 23, 2012 at 1:41 AM, Jean-Marc Spaggiari
<je...@spaggiari.org> wrote:
> Hi,
>
> Quick question on the way hadoop is using the disk space.
>
> Let's say I have 8 nodes. 7 of them with a 2T disk, and one with a 256GB.
>
> Is hadoop going to use the 256GB until it's full, then continue with
> the other nodes only but keeping the 256GB live? Or will it bring the
> 256GB node down when it will be full (like for failures) and continue
> with the 7 remaining nodes?
>
> To summarize, is hadoop taking care of the drive size?
>
> Thanks,
>
> JM



-- 
Harsh J

Re: Diskspace usage

Posted by Harsh J <ha...@cloudera.com>.
Hi,

HDFS by default writes to disk in round robin fashion, which would
mean your 256 GB will indeed fill up faster than the rest over time.

When a disk is full, the disk is ignored from writes until some block
data is deleted from it (as part of regular HDFS file deletes).
However, blocks existing on the disk are still used for reads as
normal. The disk isn't marked failed or ejected, but just not written
to anymore (i.e. not selected for round robins of block writes) until
it is usable again.

So tl;dr: This is no-worry situation.

However, is this 256 GB a root disk (i.e. OS disk)? We usually
recommend not using the OS disk for DN data storage as there's chance
of fill-ups if misconfigured, that could lead to weird issues with the
OS showing up.

On Fri, Nov 23, 2012 at 1:41 AM, Jean-Marc Spaggiari
<je...@spaggiari.org> wrote:
> Hi,
>
> Quick question on the way hadoop is using the disk space.
>
> Let's say I have 8 nodes. 7 of them with a 2T disk, and one with a 256GB.
>
> Is hadoop going to use the 256GB until it's full, then continue with
> the other nodes only but keeping the 256GB live? Or will it bring the
> 256GB node down when it will be full (like for failures) and continue
> with the 7 remaining nodes?
>
> To summarize, is hadoop taking care of the drive size?
>
> Thanks,
>
> JM



-- 
Harsh J

Re: Diskspace usage

Posted by Harsh J <ha...@cloudera.com>.
Hi,

HDFS by default writes to disk in round robin fashion, which would
mean your 256 GB will indeed fill up faster than the rest over time.

When a disk is full, the disk is ignored from writes until some block
data is deleted from it (as part of regular HDFS file deletes).
However, blocks existing on the disk are still used for reads as
normal. The disk isn't marked failed or ejected, but just not written
to anymore (i.e. not selected for round robins of block writes) until
it is usable again.

So tl;dr: This is no-worry situation.

However, is this 256 GB a root disk (i.e. OS disk)? We usually
recommend not using the OS disk for DN data storage as there's chance
of fill-ups if misconfigured, that could lead to weird issues with the
OS showing up.

On Fri, Nov 23, 2012 at 1:41 AM, Jean-Marc Spaggiari
<je...@spaggiari.org> wrote:
> Hi,
>
> Quick question on the way hadoop is using the disk space.
>
> Let's say I have 8 nodes. 7 of them with a 2T disk, and one with a 256GB.
>
> Is hadoop going to use the 256GB until it's full, then continue with
> the other nodes only but keeping the 256GB live? Or will it bring the
> 256GB node down when it will be full (like for failures) and continue
> with the 7 remaining nodes?
>
> To summarize, is hadoop taking care of the drive size?
>
> Thanks,
>
> JM



-- 
Harsh J