You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Mahmood Naderan <nt...@yahoo.com> on 2015/04/18 07:31:03 UTC

The correct way to find number of mappers and reducers

Hi,

There are good guides on the number of mappers and reducers in a hadoop job. For example:

Running Hadoop on Ubuntu Linux (Single-Node Cluster)    http://goo.gl/kaA1h5
Partitioning your job into maps and reduces     http://goo.gl/tpU23

However, there are some, say noob, question here. Assume:

A. There are 32 cores on the machine
B. The hadoop is setup on a single machine
C. There are more than 100 files in the HDFS, each is 67MB.

Now the questions are

1) How can I determine the DFS block size? it is stated that "The number of maps is usually driven by the number of DFS blocks in the input files"
2) What is the default value of io.file.buffer.size? I haven't set that...
3) Where should I exactly add those options, e.g number of mappers and reducers.


 
Regards,
Mahmood