You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Wasim Bari <wa...@msn.com> on 2009/06/05 18:51:00 UTC

Max. Possible No. of Files

Hi,
     Does someone has some data regarding maximum possible number of files over HDFS ?

my second question is, I created small files with small block size up to one lac and read the files from HDFS, reading performance remains almost unaffected with increasing number of files.

The possible reasons I could think are:

1  . One lac isn't a big number to disturb HDFS performance (I used 1 namenode and 4 data nodes)

2.  As reading is done directly from datanode with first time interaction with namenode, so reading from different nodes doesn't affect the performance. 


If someone could add or negate some information it will be highly appreciated. 

Cheers,
Wasim

Re: Max. Possible No. of Files

Posted by Konstantin Shvachko <sh...@yahoo-inc.com>.
There are some name-node memory estimates in this jira.
http://issues.apache.org/jira/browse/HADOOP-1687

With 16 GB you can normally have 60 million objects (files
+ blocks) on the name-node. The number of files would depend
on the file to block ratio.

--Konstantin


Brian Bockelman wrote:
> 
> On Jun 5, 2009, at 11:51 AM, Wasim Bari wrote:
> 
>> Hi,
>>     Does someone has some data regarding maximum possible number of 
>> files over HDFS ?
>>
> 
> Hey Wasim,
> 
> I don't think that there is a maximum limit.  Remember:
> 1) Less is better.  HDFS is optimized for big files.
> 2) The amount of memory the HDFS namenode needs is a function of the 
> number of files.  If you have a huge number of files, you get a huge 
> memory requirement.
> 
> 1-2 million files is fairly safe if you have a normal-looking namenode 
> server (8-16GB RAM).  I know some of our UCSD colleagues just ran a test 
> where they were able to put more than .5M files in a single directory 
> and still have a useable file system.
> 
> Brian
> 
>> my second question is, I created small files with small block size up 
>> to one lac and read the files from HDFS, reading performance remains 
>> almost unaffected with increasing number of files.
>>
>> The possible reasons I could think are:
>>
>> 1  . One lac isn't a big number to disturb HDFS performance (I used 1 
>> namenode and 4 data nodes)
>>
>> 2.  As reading is done directly from datanode with first time 
>> interaction with namenode, so reading from different nodes doesn't 
>> affect the performance.
>>
>>
>> If someone could add or negate some information it will be highly 
>> appreciated.
>>
>> Cheers,
>> Wasim
> 
> 

Re: Max. Possible No. of Files

Posted by Brian Bockelman <bb...@cse.unl.edu>.
On Jun 5, 2009, at 11:51 AM, Wasim Bari wrote:

> Hi,
>     Does someone has some data regarding maximum possible number of  
> files over HDFS ?
>

Hey Wasim,

I don't think that there is a maximum limit.  Remember:
1) Less is better.  HDFS is optimized for big files.
2) The amount of memory the HDFS namenode needs is a function of the  
number of files.  If you have a huge number of files, you get a huge  
memory requirement.

1-2 million files is fairly safe if you have a normal-looking namenode  
server (8-16GB RAM).  I know some of our UCSD colleagues just ran a  
test where they were able to put more than .5M files in a single  
directory and still have a useable file system.

Brian

> my second question is, I created small files with small block size  
> up to one lac and read the files from HDFS, reading performance  
> remains almost unaffected with increasing number of files.
>
> The possible reasons I could think are:
>
> 1  . One lac isn't a big number to disturb HDFS performance (I used  
> 1 namenode and 4 data nodes)
>
> 2.  As reading is done directly from datanode with first time  
> interaction with namenode, so reading from different nodes doesn't  
> affect the performance.
>
>
> If someone could add or negate some information it will be highly  
> appreciated.
>
> Cheers,
> Wasim