You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by ChihChun Chu <st...@gmail.com> on 2009/03/12 06:38:14 UTC

about block size

Hi,

I have a question about how to decide the block size.
As I understanding, the block size is related to namenode's heap size(how
many blocks can be handled),
total storage capacity of clusters, the files size (depends on applications,
e.g. 1T log file), #of replicas,
and the performance of mapreduce.
In Google's paper, they used 64MB as their block size. Yahoo and Facebook
seems set block size
to 128MB. Hadoop default value is 64MB. I don't know why 64MB or 128MB. Is
that the result from the tradeoff
as I mentioned above? How do I decide the block size if I want to build my
application upon Hadoop? Is their
any criterion or formula?

Any opinions or comments will be appreciate.


stchu

Re: about block size

Posted by Doug Cutting <cu...@apache.org>.
One factor is that block size should minimize the impact of disk seeks. 
  For example, if a disk seeks in 10ms and transfers at 100MB/s, then a 
good block size will be substantially larger than 1MB.  With 100MB 
blocks, seeks would only slow things by 1%.

Another factor is that, unless files are smaller than the block size, 
larger blocks means fewer blocks, and fewer blocks make for a more 
efficient namenode.

The primary harm of too large blocks is that you will end up with fewer 
map tasks than nodes, and not use your cluster optimally.

Doug

ChihChun Chu wrote:
> Hi,
> 
> I have a question about how to decide the block size.
> As I understanding, the block size is related to namenode's heap size(how
> many blocks can be handled),
> total storage capacity of clusters, the files size (depends on applications,
> e.g. 1T log file), #of replicas,
> and the performance of mapreduce.
> In Google's paper, they used 64MB as their block size. Yahoo and Facebook
> seems set block size
> to 128MB. Hadoop default value is 64MB. I don't know why 64MB or 128MB. Is
> that the result from the tradeoff
> as I mentioned above? How do I decide the block size if I want to build my
> application upon Hadoop? Is their
> any criterion or formula?
> 
> Any opinions or comments will be appreciate.
> 
> 
> stchu
>