You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Vasilis Liaskovitis <vl...@gmail.com> on 2009/09/17 01:15:50 UTC

filesystem counters HDFS_BYTES vs FILE_BYTES

Hi,

in the filesystem counters for each job, what is the difference
between HDFS_BYTES_WRITTEN and FILE_BYTES_WRITTEN?

- Do they refer to disjoint data, perhaps hdfs-metadata and map/reduce
application data respectively?
- another interpretation is that HDFS_BYTES refers to bytes
"virtually" written to the HDFS system without taking into account the
replicated blocks, whereas FILE_BYTES shows the real i/o for all
replicas of the data blocks. With dfs.replication=3, I 'd expect
FILE_BYTES to be 3 times the HDFS_BYTES, but this is not the case on a
sorting job:
FILE_BYTES_WRITTEN=56497924338
HDFS_BYTES_WRITTEN=29445895448

Is there a description for these counters on the wiki somewhere?
More generally is there a description for all the generic filesystem
and map/reduce counters?
thanks,

- Vasilis