You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Foss User <fo...@gmail.com> on 2009/05/19 21:52:38 UTC

My configuration in conf/hadoop-site.xml is not being used. Why?

I ran a job. In the jobtracker web interface, I found 4 maps and 1
reduce running. This is not what I set in my configuration files
(hadoop-site.xml).

My configuration file, conf/hadoop-site.xml is set as follows:

mapred.map.tasks = 2
mapred.reduce.tasks = 2

However, the description of these properties mention that these
settings would be ignored if mapred.job.tracker is set as 'local'.
Mine is set properly with IP address, port number. Please note that
the above configuration is from the 'conf/hadoop-site.xml' file of the
job tracker node.

I have also not overridden these settings in my Job class (java code).

So, can anyone please explain why it was executing 4 maps but only 1
reduce? I have included some important entries from the job.xml of
this job below:

name    value
mapred.skip.reduce.max.skip.groups      0
mapred.reduce.max.attempts      4
mapred.reduce.tasks     1
mapred.reduce.tasks.speculative.execution       true
mapred.tasktracker.reduce.tasks.maximum 2
dfs.replication 2
mapred.reduce.copy.backoff      300

mapred.task.cache.levels        2
mapred.max.tracker.failures     4
mapred.map.tasks        4
mapred.map.tasks.speculative.execution  true
mapred.tasktracker.map.tasks.maximum    2

Please help.

Re: My configuration in conf/hadoop-site.xml is not being used. Why?

Posted by Aaron Kimball <aa...@cloudera.com>.
The mapred.map.tasks parameter is used as a hint more than anything else. If
there are more files than this, it'll have more map tasks. So if you've got
four input files, that's going to be four map tasks.

The value of mapred.reduce.tasks will be taken from the hadoop-site.xml file
on the machine that submits the job -- not the JobTracker. If those two
machines are separate, the client's hadoop-site.xml will win.

- Aaron

On Tue, May 19, 2009 at 12:52 PM, Foss User <fo...@gmail.com> wrote:

> I ran a job. In the jobtracker web interface, I found 4 maps and 1
> reduce running. This is not what I set in my configuration files
> (hadoop-site.xml).
>
> My configuration file, conf/hadoop-site.xml is set as follows:
>
> mapred.map.tasks = 2
> mapred.reduce.tasks = 2
>
> However, the description of these properties mention that these
> settings would be ignored if mapred.job.tracker is set as 'local'.
> Mine is set properly with IP address, port number. Please note that
> the above configuration is from the 'conf/hadoop-site.xml' file of the
> job tracker node.
>
> I have also not overridden these settings in my Job class (java code).
>
> So, can anyone please explain why it was executing 4 maps but only 1
> reduce? I have included some important entries from the job.xml of
> this job below:
>
> name    value
> mapred.skip.reduce.max.skip.groups      0
> mapred.reduce.max.attempts      4
> mapred.reduce.tasks     1
> mapred.reduce.tasks.speculative.execution       true
> mapred.tasktracker.reduce.tasks.maximum 2
> dfs.replication 2
> mapred.reduce.copy.backoff      300
>
> mapred.task.cache.levels        2
> mapred.max.tracker.failures     4
> mapred.map.tasks        4
> mapred.map.tasks.speculative.execution  true
> mapred.tasktracker.map.tasks.maximum    2
>
> Please help.
>