You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Dennis Kubes <nu...@dragonflymc.com> on 2006/03/26 22:11:50 UTC

Hadoop File Capacity

For the Hadoop filesystem, I know that it is basically unlimited in terms of
storage because one can always add new hardware, but it is unlimited in
terms of a single file?

What I mean by this is if I store a file /user/dir/a.index and this file has
say 100 blocks in it where there is only enough space on any server for 10
blocks; will the Hadoop filesystem store and replicate different blocks on
different servers and give the client a single file view or does a whole
file have to be stored and replicated across machines.

Dennis


Re: Hadoop File Capacity

Posted by Eric Baldeschwieler <er...@yahoo-inc.com>.
yes

On Mar 26, 2006, at 12:11 PM, Dennis Kubes wrote:

> For the Hadoop filesystem, I know that it is basically unlimited in  
> terms of
> storage because one can always add new hardware, but it is  
> unlimited in
> terms of a single file?
>
> What I mean by this is if I store a file /user/dir/a.index and this  
> file has
> say 100 blocks in it where there is only enough space on any server  
> for 10
> blocks; will the Hadoop filesystem store and replicate different  
> blocks on
> different servers and give the client a single file view or does a  
> whole
> file have to be stored and replicated across machines.
>
> Dennis
>


RE: Hadoop File Capacity

Posted by Yoram Arnon <ya...@yahoo-inc.com>.
Each block of each file is scattered on (currently) three random data nodes,
not related to the previous block placement. So no, no limits on file size
until you reach the FS limits, which are reasonably high and growing
(probably a couple 100 TB currently).

-----Original Message-----
From: Dennis Kubes [mailto:nutch-dev@dragonflymc.com] 
Sent: Sunday, March 26, 2006 12:12 PM
To: hadoop-user@lucene.apache.org
Subject: Hadoop File Capacity

For the Hadoop filesystem, I know that it is basically unlimited in terms of
storage because one can always add new hardware, but it is unlimited in
terms of a single file?

What I mean by this is if I store a file /user/dir/a.index and this file has
say 100 blocks in it where there is only enough space on any server for 10
blocks; will the Hadoop filesystem store and replicate different blocks on
different servers and give the client a single file view or does a whole
file have to be stored and replicated across machines.

Dennis