You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Jean-Marc Spaggiari <je...@spaggiari.org> on 2012/12/28 18:12:59 UTC

Hadoop harddrive space usage

Hi,

Quick question regarding hard drive space usage.

Hadoop will distribute the data evenly on the cluster. So all the
nodes are going to receive almost the same quantity of data to store.

Now, if on one node I have 2 directories configured, is hadoop going
to assign twice the quantity on this node? Or is each directory going
to receive half the load?

Thanks,

JM

Re: Hadoop harddrive space usage

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Perfect, thanks. It's what I was looking for.

I have few nodes, all with 2TB drives, but one with 2x1TB. Which mean
that at the end, for Hadoop, it's almost the same thing.

JM

2012/12/28, Robert Molina <rm...@hortonworks.com>:
> Hi Jean,
> Hadoop will not factor in number of disks or directories, but rather mainly
> allocated free space.  Hadoop will do its best to spread the data across
> evenly amongst the nodes.  For instance, let's say you had 3 datanodes
> (replication factor 1) and all have allocated 10GB each, but one of the
> nodes split the 10GB into two directories.  Now if we try to store a file
> that takes up 3 blocks, Hadoop will just place 1 block in each node.
>
> Hope that helps.
>
> Regards,
> Robert
>
> On Fri, Dec 28, 2012 at 9:12 AM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
>> Hi,
>>
>> Quick question regarding hard drive space usage.
>>
>> Hadoop will distribute the data evenly on the cluster. So all the
>> nodes are going to receive almost the same quantity of data to store.
>>
>> Now, if on one node I have 2 directories configured, is hadoop going
>> to assign twice the quantity on this node? Or is each directory going
>> to receive half the load?
>>
>> Thanks,
>>
>> JM
>>
>

Re: Hadoop harddrive space usage

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Perfect, thanks. It's what I was looking for.

I have few nodes, all with 2TB drives, but one with 2x1TB. Which mean
that at the end, for Hadoop, it's almost the same thing.

JM

2012/12/28, Robert Molina <rm...@hortonworks.com>:
> Hi Jean,
> Hadoop will not factor in number of disks or directories, but rather mainly
> allocated free space.  Hadoop will do its best to spread the data across
> evenly amongst the nodes.  For instance, let's say you had 3 datanodes
> (replication factor 1) and all have allocated 10GB each, but one of the
> nodes split the 10GB into two directories.  Now if we try to store a file
> that takes up 3 blocks, Hadoop will just place 1 block in each node.
>
> Hope that helps.
>
> Regards,
> Robert
>
> On Fri, Dec 28, 2012 at 9:12 AM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
>> Hi,
>>
>> Quick question regarding hard drive space usage.
>>
>> Hadoop will distribute the data evenly on the cluster. So all the
>> nodes are going to receive almost the same quantity of data to store.
>>
>> Now, if on one node I have 2 directories configured, is hadoop going
>> to assign twice the quantity on this node? Or is each directory going
>> to receive half the load?
>>
>> Thanks,
>>
>> JM
>>
>

Re: Hadoop harddrive space usage

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Perfect, thanks. It's what I was looking for.

I have few nodes, all with 2TB drives, but one with 2x1TB. Which mean
that at the end, for Hadoop, it's almost the same thing.

JM

2012/12/28, Robert Molina <rm...@hortonworks.com>:
> Hi Jean,
> Hadoop will not factor in number of disks or directories, but rather mainly
> allocated free space.  Hadoop will do its best to spread the data across
> evenly amongst the nodes.  For instance, let's say you had 3 datanodes
> (replication factor 1) and all have allocated 10GB each, but one of the
> nodes split the 10GB into two directories.  Now if we try to store a file
> that takes up 3 blocks, Hadoop will just place 1 block in each node.
>
> Hope that helps.
>
> Regards,
> Robert
>
> On Fri, Dec 28, 2012 at 9:12 AM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
>> Hi,
>>
>> Quick question regarding hard drive space usage.
>>
>> Hadoop will distribute the data evenly on the cluster. So all the
>> nodes are going to receive almost the same quantity of data to store.
>>
>> Now, if on one node I have 2 directories configured, is hadoop going
>> to assign twice the quantity on this node? Or is each directory going
>> to receive half the load?
>>
>> Thanks,
>>
>> JM
>>
>

Re: Hadoop harddrive space usage

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Perfect, thanks. It's what I was looking for.

I have few nodes, all with 2TB drives, but one with 2x1TB. Which mean
that at the end, for Hadoop, it's almost the same thing.

JM

2012/12/28, Robert Molina <rm...@hortonworks.com>:
> Hi Jean,
> Hadoop will not factor in number of disks or directories, but rather mainly
> allocated free space.  Hadoop will do its best to spread the data across
> evenly amongst the nodes.  For instance, let's say you had 3 datanodes
> (replication factor 1) and all have allocated 10GB each, but one of the
> nodes split the 10GB into two directories.  Now if we try to store a file
> that takes up 3 blocks, Hadoop will just place 1 block in each node.
>
> Hope that helps.
>
> Regards,
> Robert
>
> On Fri, Dec 28, 2012 at 9:12 AM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
>> Hi,
>>
>> Quick question regarding hard drive space usage.
>>
>> Hadoop will distribute the data evenly on the cluster. So all the
>> nodes are going to receive almost the same quantity of data to store.
>>
>> Now, if on one node I have 2 directories configured, is hadoop going
>> to assign twice the quantity on this node? Or is each directory going
>> to receive half the load?
>>
>> Thanks,
>>
>> JM
>>
>

Re: Hadoop harddrive space usage

Posted by Robert Molina <rm...@hortonworks.com>.
Hi Jean,
Hadoop will not factor in number of disks or directories, but rather mainly
allocated free space.  Hadoop will do its best to spread the data across
evenly amongst the nodes.  For instance, let's say you had 3 datanodes
(replication factor 1) and all have allocated 10GB each, but one of the
nodes split the 10GB into two directories.  Now if we try to store a file
that takes up 3 blocks, Hadoop will just place 1 block in each node.

Hope that helps.

Regards,
Robert

On Fri, Dec 28, 2012 at 9:12 AM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi,
>
> Quick question regarding hard drive space usage.
>
> Hadoop will distribute the data evenly on the cluster. So all the
> nodes are going to receive almost the same quantity of data to store.
>
> Now, if on one node I have 2 directories configured, is hadoop going
> to assign twice the quantity on this node? Or is each directory going
> to receive half the load?
>
> Thanks,
>
> JM
>

Re: Hadoop harddrive space usage

Posted by Robert Molina <rm...@hortonworks.com>.
Hi Jean,
Hadoop will not factor in number of disks or directories, but rather mainly
allocated free space.  Hadoop will do its best to spread the data across
evenly amongst the nodes.  For instance, let's say you had 3 datanodes
(replication factor 1) and all have allocated 10GB each, but one of the
nodes split the 10GB into two directories.  Now if we try to store a file
that takes up 3 blocks, Hadoop will just place 1 block in each node.

Hope that helps.

Regards,
Robert

On Fri, Dec 28, 2012 at 9:12 AM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi,
>
> Quick question regarding hard drive space usage.
>
> Hadoop will distribute the data evenly on the cluster. So all the
> nodes are going to receive almost the same quantity of data to store.
>
> Now, if on one node I have 2 directories configured, is hadoop going
> to assign twice the quantity on this node? Or is each directory going
> to receive half the load?
>
> Thanks,
>
> JM
>

Re: Hadoop harddrive space usage

Posted by Robert Molina <rm...@hortonworks.com>.
Hi Jean,
Hadoop will not factor in number of disks or directories, but rather mainly
allocated free space.  Hadoop will do its best to spread the data across
evenly amongst the nodes.  For instance, let's say you had 3 datanodes
(replication factor 1) and all have allocated 10GB each, but one of the
nodes split the 10GB into two directories.  Now if we try to store a file
that takes up 3 blocks, Hadoop will just place 1 block in each node.

Hope that helps.

Regards,
Robert

On Fri, Dec 28, 2012 at 9:12 AM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi,
>
> Quick question regarding hard drive space usage.
>
> Hadoop will distribute the data evenly on the cluster. So all the
> nodes are going to receive almost the same quantity of data to store.
>
> Now, if on one node I have 2 directories configured, is hadoop going
> to assign twice the quantity on this node? Or is each directory going
> to receive half the load?
>
> Thanks,
>
> JM
>

Re: Hadoop harddrive space usage

Posted by Robert Molina <rm...@hortonworks.com>.
Hi Jean,
Hadoop will not factor in number of disks or directories, but rather mainly
allocated free space.  Hadoop will do its best to spread the data across
evenly amongst the nodes.  For instance, let's say you had 3 datanodes
(replication factor 1) and all have allocated 10GB each, but one of the
nodes split the 10GB into two directories.  Now if we try to store a file
that takes up 3 blocks, Hadoop will just place 1 block in each node.

Hope that helps.

Regards,
Robert

On Fri, Dec 28, 2012 at 9:12 AM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi,
>
> Quick question regarding hard drive space usage.
>
> Hadoop will distribute the data evenly on the cluster. So all the
> nodes are going to receive almost the same quantity of data to store.
>
> Now, if on one node I have 2 directories configured, is hadoop going
> to assign twice the quantity on this node? Or is each directory going
> to receive half the load?
>
> Thanks,
>
> JM
>