You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Thanh Hong Dai <hd...@tma.com.vn> on 2016/03/22 06:08:08 UTC

A question regarding memory usage on NameNode and replication

Hi,

 

To get to the point: Does the number of replicas of a block increases the
memory requirement on NameNode, and by how much?

 

The calculation in this paper
https://www.usenix.org/legacy/publications/login/2010-04/openpdfs/shvachko.p
df from Yahoo! assumes 200 bytes per metadata object, and with 1.5 block per
file, it needs 3 object (1 for the file, 2 for the blocks). The replication
factor is not mentioned in the paper and doesn't participate in the
calculation.

 

This email
https://www.mail-archive.com/core-user@hadoop.apache.org/msg02835.html in
the mailing list assumes 150 bytes per metadata object, but it messed up the
calculation by an order of magnitude, since 1M files (1 block each) will use
2M metadata objects (1 for file, 1 for block), which results in 300MB, not
3GB. This article
http://blog.cloudera.com/blog/2009/02/the-small-files-problem/ from Cloudera
cites the mail, but corrects the number to match the figure. The replication
factor is not mentioned in both cases and does not participate in the
calculation.

 

This answer on StackOverflow
https://stackoverflow.com/questions/10764493/namenode-file-quantity-limit
adds two metadata object (for file and for block) for each replication,
which does not match the method of calculation from the links above. 

 

Which one(s) of them is/are correct? Does replication use one metadata
object per block replica, or only a slight increase in the size of the
metadata object?

 

 

Best regards,

Hong Dai Thanh