You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Foss User <fo...@gmail.com> on 2009/05/19 13:49:56 UTC

Number of maps and reduces not obeying my configuration

I ran a job. In the jobtracker web interface, I found 4 maps and 1
reduce running. This is not what I set in my configuration files
(hadoop-site.xml).

My configuration file is set as follows:

mapred.map.tasks = 2
mapred.reduce.tasks = 2

However, the description of these properties mention that these
settings would be ignored if mapred.job.tracker is set as 'local'.
Mine is set properly with IP address, port number. Please note that
the above configuration is from the 'conf/hadoop-site.xml' file of the
job tracker node.

So, can anyone please explain why it was executing 4 maps but only 1
reduce? I have included some important entries from the job.xml of
this job below:

name	value
mapred.skip.reduce.max.skip.groups	0
mapred.reduce.max.attempts	4
mapred.reduce.tasks	1
mapred.reduce.tasks.speculative.execution	true
mapred.tasktracker.reduce.tasks.maximum	2
dfs.replication	2
mapred.reduce.copy.backoff	300

mapred.task.cache.levels	2
mapred.max.tracker.failures	4
mapred.map.tasks	4
mapred.map.tasks.speculative.execution	true
mapred.tasktracker.map.tasks.maximum	2

Please help.

Re: Number of maps and reduces not obeying my configuration

Posted by Tom White <to...@cloudera.com>.

On Thu, May 21, 2009 at 5:18 AM, Foss User <fo...@gmail.com> wrote:
> On Wed, May 20, 2009 at 3:18 PM, Tom White <to...@cloudera.com> wrote:
>> The number of maps to use is calculated on the client, since splits
>> are computed on the client, so changing the value of mapred.map.tasks
>> only on the jobtracker will not have any effect.
>>
>> Note that the number of map tasks that you set is only a suggestion,
>> and depends on the number of splits actually created. In your case it
>> looks like 4 splits were created. As a rule, you shouldn't set the
>> number of map tasks, since by default one map task is created for each
>> HDFS block, which works well for most applications. This is explained
>> further in the javadoc:
>> http://hadoop.apache.org/core/docs/r0.19.1/api/org/apache/hadoop/mapred/JobConf.html#setNumMapTasks(int)
>>
>> The number of reduces to use is determined by the JobConf that is
>> created on the client, so it uses the client's hadoop-site.xml, not
>> the jobtracker's one. This is why it is set to 1, even though you set
>> it to 2 on the jobtracker.
>>
>> If you don't want to set configuration properties in code (and I agree
>> it's often a good idea not to hardcode things like the number of maps
>> or reduces in code), then you can make your driver use Tool and
>> ToolRunner as Chuck explained.
>>
>> Finally, in general you should try to keep hadoop-site.xml the same
>> across your clients and cluster nodes to avoid surprises about which
>> value has been set.
>>
>> Hope this helps,
>>
>> Tom
>
> By client do you mean the machine where I logged in and invoked
> 'hadoop jar' command to submit and run my job?

Yes.

Re: Number of maps and reduces not obeying my configuration

Posted by Foss User <fo...@gmail.com>.

On Wed, May 20, 2009 at 3:18 PM, Tom White <to...@cloudera.com> wrote:
> The number of maps to use is calculated on the client, since splits
> are computed on the client, so changing the value of mapred.map.tasks
> only on the jobtracker will not have any effect.
>
> Note that the number of map tasks that you set is only a suggestion,
> and depends on the number of splits actually created. In your case it
> looks like 4 splits were created. As a rule, you shouldn't set the
> number of map tasks, since by default one map task is created for each
> HDFS block, which works well for most applications. This is explained
> further in the javadoc:
> http://hadoop.apache.org/core/docs/r0.19.1/api/org/apache/hadoop/mapred/JobConf.html#setNumMapTasks(int)
>
> The number of reduces to use is determined by the JobConf that is
> created on the client, so it uses the client's hadoop-site.xml, not
> the jobtracker's one. This is why it is set to 1, even though you set
> it to 2 on the jobtracker.
>
> If you don't want to set configuration properties in code (and I agree
> it's often a good idea not to hardcode things like the number of maps
> or reduces in code), then you can make your driver use Tool and
> ToolRunner as Chuck explained.
>
> Finally, in general you should try to keep hadoop-site.xml the same
> across your clients and cluster nodes to avoid surprises about which
> value has been set.
>
> Hope this helps,
>
> Tom

By client do you mean the machine where I logged in and invoked
'hadoop jar' command to submit and run my job?

Re: Number of maps and reduces not obeying my configuration

Posted by Tom White <to...@cloudera.com>.

The number of maps to use is calculated on the client, since splits
are computed on the client, so changing the value of mapred.map.tasks
only on the jobtracker will not have any effect.

Note that the number of map tasks that you set is only a suggestion,
and depends on the number of splits actually created. In your case it
looks like 4 splits were created. As a rule, you shouldn't set the
number of map tasks, since by default one map task is created for each
HDFS block, which works well for most applications. This is explained
further in the javadoc:
http://hadoop.apache.org/core/docs/r0.19.1/api/org/apache/hadoop/mapred/JobConf.html#setNumMapTasks(int)

The number of reduces to use is determined by the JobConf that is
created on the client, so it uses the client's hadoop-site.xml, not
the jobtracker's one. This is why it is set to 1, even though you set
it to 2 on the jobtracker.

If you don't want to set configuration properties in code (and I agree
it's often a good idea not to hardcode things like the number of maps
or reduces in code), then you can make your driver use Tool and
ToolRunner as Chuck explained.

Finally, in general you should try to keep hadoop-site.xml the same
across your clients and cluster nodes to avoid surprises about which
value has been set.

Hope this helps,

Tom

On Wed, May 20, 2009 at 5:21 AM, Foss User <fo...@gmail.com> wrote:
> On Wed, May 20, 2009 at 3:39 AM, Chuck Lam <ch...@gmail.com> wrote:
>> Can you set the number of reducers to zero and see if it becomes a map only
>> job? If it does, then it's able to read in the mapred.reduce.tasks property
>> correctly but just refuse to have 2 reducers. In that case, it's most likely
>> you're running in local mode, which doesn't allow more than 1 reducer.
>
> As I have already mentioned in my original mail, I am not running it
> in local mode. Quoting from my original mail:
>
> "My configuration file is set as follows:
>
> mapred.map.tasks = 2
> mapred.reduce.tasks = 2
>
> However, the description of these properties mention that these
> settings would be ignored if mapred.job.tracker is set as 'local'.
> Mine is set properly with IP address, port number."
>
>>
>> If setting zero doesn't change anything, then your config file is not being
>> read, or it's being overridden.
>>
>> As an aside, if you use ToolRunner in your Hadoop program, then it will
>> support generic options such that you can run your program with the option
>> -D mapred.reduce.tasks=2
>> to tell it to use 2 reducers. This allows you to set the number of reducers
>> on a per-job basis.
>>
>>
>
> I understand that it is being overridden by something else. What I
> want to know is which file is overriding it. Also, please note that I
> have these settings only in the conf/hadoop-site.xml of job tracker
> node. Is that enough?
>

Re: Number of maps and reduces not obeying my configuration

Posted by Foss User <fo...@gmail.com>.

On Wed, May 20, 2009 at 3:39 AM, Chuck Lam <ch...@gmail.com> wrote:
> Can you set the number of reducers to zero and see if it becomes a map only
> job? If it does, then it's able to read in the mapred.reduce.tasks property
> correctly but just refuse to have 2 reducers. In that case, it's most likely
> you're running in local mode, which doesn't allow more than 1 reducer.

As I have already mentioned in my original mail, I am not running it
in local mode. Quoting from my original mail:

"My configuration file is set as follows:

mapred.map.tasks = 2
mapred.reduce.tasks = 2

However, the description of these properties mention that these
settings would be ignored if mapred.job.tracker is set as 'local'.
Mine is set properly with IP address, port number."

>
> If setting zero doesn't change anything, then your config file is not being
> read, or it's being overridden.
>
> As an aside, if you use ToolRunner in your Hadoop program, then it will
> support generic options such that you can run your program with the option
> -D mapred.reduce.tasks=2
> to tell it to use 2 reducers. This allows you to set the number of reducers
> on a per-job basis.
>
>

I understand that it is being overridden by something else. What I
want to know is which file is overriding it. Also, please note that I
have these settings only in the conf/hadoop-site.xml of job tracker
node. Is that enough?

Re: Number of maps and reduces not obeying my configuration

Posted by Chuck Lam <ch...@gmail.com>.

Can you set the number of reducers to zero and see if it becomes a map only
job? If it does, then it's able to read in the mapred.reduce.tasks property
correctly but just refuse to have 2 reducers. In that case, it's most likely
you're running in local mode, which doesn't allow more than 1 reducer.

If setting zero doesn't change anything, then your config file is not being
read, or it's being overridden.

As an aside, if you use ToolRunner in your Hadoop program, then it will
support generic options such that you can run your program with the option
-D mapred.reduce.tasks=2
to tell it to use 2 reducers. This allows you to set the number of reducers
on a per-job basis.

On Tue, May 19, 2009 at 1:54 PM, Foss User <fo...@gmail.com> wrote:

> On Wed, May 20, 2009 at 1:52 AM, Piotr Praczyk <pi...@gmail.com>
> wrote:
> > After a first mail I understood that you are providing additional job.xml
> (
> > which can be done).
> > What version of Hadoop do you use ? In 0.20 there was some change in
> > configuration files - as far as I understood from the messages,
> > hadoop-site.xml was splitted into few other... where the overriding
> settings
> > can reside.
> >
> >
> > Piotr
>
> I am using Hadoop 0.19.1. Could you please tell me how I can
> troubleshoot this issue? I need to run 2 maps and 2 reducers and I
> have to set this configuration. Currently, it is set in
> conf/hadoop-site.xml but it seems this is being overridden.
>

Re: Number of maps and reduces not obeying my configuration

Posted by Foss User <fo...@gmail.com>.

On Wed, May 20, 2009 at 1:52 AM, Piotr Praczyk <pi...@gmail.com> wrote:
> After a first mail I understood that you are providing additional job.xml (
> which can be done).
> What version of Hadoop do you use ? In 0.20 there was some change in
> configuration files - as far as I understood from the messages,
> hadoop-site.xml was splitted into few other... where the overriding settings
> can reside.
>
>
> Piotr

I am using Hadoop 0.19.1. Could you please tell me how I can
troubleshoot this issue? I need to run 2 maps and 2 reducers and I
have to set this configuration. Currently, it is set in
conf/hadoop-site.xml but it seems this is being overridden.

Re: Number of maps and reduces not obeying my configuration

Posted by Piotr Praczyk <pi...@gmail.com>.

After a first mail I understood that you are providing additional job.xml (
which can be done).
What version of Hadoop do you use ? In 0.20 there was some change in
configuration files - as far as I understood from the messages,
hadoop-site.xml was splitted into few other... where the overriding settings
can reside.

Piotr

2009/5/19 Foss User <fo...@gmail.com>

> On Tue, May 19, 2009 at 8:23 PM, He Chen <ai...@gmail.com> wrote:
> > I think, they are not overridden every times. If you do not give any
> > configuration in your source code, the hadoop-site.xml will helps you
> > configure the framework. At the same time, you will not configure all the
> > parameters of hadoop framework in your program, then hadoop-site.xml
> helps.
>
> Then we are back to the question I asked in my original mail. I have
> not specified any configurationin code. So, the configuration in
> hadoop-site.xml should be used. Then why is it not being used?
>

Re: Number of maps and reduces not obeying my configuration

Posted by Foss User <fo...@gmail.com>.

On Tue, May 19, 2009 at 8:23 PM, He Chen <ai...@gmail.com> wrote:
> I think, they are not overridden every times. If you do not give any
> configuration in your source code, the hadoop-site.xml will helps you
> configure the framework. At the same time, you will not configure all the
> parameters of hadoop framework in your program, then hadoop-site.xml helps.

Then we are back to the question I asked in my original mail. I have
not specified any configurationin code. So, the configuration in
hadoop-site.xml should be used. Then why is it not being used?

Re: Number of maps and reduces not obeying my configuration

Posted by He Chen <ai...@gmail.com>.

I think, they are not overridden every times. If you do not give any
configuration in your source code, the hadoop-site.xml will helps you
configure the framework. At the same time, you will not configure all the
parameters of hadoop framework in your program, then hadoop-site.xml helps.

On Tue, May 19, 2009 at 9:46 AM, Foss User <fo...@gmail.com> wrote:

> On Tue, May 19, 2009 at 8:04 PM, He Chen <ai...@gmail.com> wrote:
> > change following parameter
> > mapred.reduce.max.attempts      4
> > mapred.reduce.tasks     1To
> > mapred.reduce.max.attempts      2
> > mapred.reduce.tasks     2
> > In your program source code!
>
> If these parameters in hadoop-site.xml is always going to be
> overridden, then what is the use of having these properties in
> hadoop-site.xml?
>
> I don't want to put these properties in my program source code so that
> the program can be run in any cluster, large size or medium size. I
> want these configurations to be mentioned in the cluster.
>
> Can someone tell me why the job XML has different properties than what
> I have specified in hadoop-site.xml? Who overrides it and why?
>



-- 
Chen He
RCF CSE Dept.
University of Nebraska-Lincoln
US

Re: Number of maps and reduces not obeying my configuration

Posted by Foss User <fo...@gmail.com>.

On Tue, May 19, 2009 at 8:04 PM, He Chen <ai...@gmail.com> wrote:
> change following parameter
> mapred.reduce.max.attempts      4
> mapred.reduce.tasks     1To
> mapred.reduce.max.attempts      2
> mapred.reduce.tasks     2
> In your program source code!

If these parameters in hadoop-site.xml is always going to be
overridden, then what is the use of having these properties in
hadoop-site.xml?

I don't want to put these properties in my program source code so that
the program can be run in any cluster, large size or medium size. I
want these configurations to be mentioned in the cluster.

Can someone tell me why the job XML has different properties than what
I have specified in hadoop-site.xml? Who overrides it and why?

Re: Number of maps and reduces not obeying my configuration

Posted by He Chen <ai...@gmail.com>.

change following parameter
mapred.reduce.max.attempts      4
mapred.reduce.tasks     1To
mapred.reduce.max.attempts      2
mapred.reduce.tasks     2
In your program source code!

On Tue, May 19, 2009 at 9:14 AM, Foss User <fo...@gmail.com> wrote:

> On Tue, May 19, 2009 at 5:32 PM, Piotr Praczyk <pi...@gmail.com>
> wrote:
> > Hi
> >
> > Your job configuration file specifies exactly the numbers of mappers and
> > reducers that are running in your system. The job configuration overrides
> > site configuration ( if parameters are not specified as final ) as far as
> I
> > know.
> >
> >
> > Piotr
>
> So, how do I control the job configuration. I thought specifying the
> number of mappers and reducers in the hadoop-site.xml is enough.
> Please tell me which file I should edit to control the job
> configuration.
>



-- 
Chen He
RCF CSE Dept.
University of Nebraska-Lincoln
US

Re: Number of maps and reduces not obeying my configuration

Posted by Foss User <fo...@gmail.com>.

On Tue, May 19, 2009 at 5:32 PM, Piotr Praczyk <pi...@gmail.com> wrote:
> Hi
>
> Your job configuration file specifies exactly the numbers of mappers and
> reducers that are running in your system. The job configuration overrides
> site configuration ( if parameters are not specified as final ) as far as I
> know.
>
>
> Piotr

So, how do I control the job configuration. I thought specifying the
number of mappers and reducers in the hadoop-site.xml is enough.
Please tell me which file I should edit to control the job
configuration.

Re: Number of maps and reduces not obeying my configuration

Posted by Piotr Praczyk <pi...@gmail.com>.

Hi

Your job configuration file specifies exactly the numbers of mappers and
reducers that are running in your system. The job configuration overrides
site configuration ( if parameters are not specified as final ) as far as I
know.


Piotr

2009/5/19 Foss User <fo...@gmail.com>

> I ran a job. In the jobtracker web interface, I found 4 maps and 1
> reduce running. This is not what I set in my configuration files
> (hadoop-site.xml).
>
> My configuration file is set as follows:
>
> mapred.map.tasks = 2
> mapred.reduce.tasks = 2
>
> However, the description of these properties mention that these
> settings would be ignored if mapred.job.tracker is set as 'local'.
> Mine is set properly with IP address, port number. Please note that
> the above configuration is from the 'conf/hadoop-site.xml' file of the
> job tracker node.
>
> So, can anyone please explain why it was executing 4 maps but only 1
> reduce? I have included some important entries from the job.xml of
> this job below:
>
> name    value
> mapred.skip.reduce.max.skip.groups      0
> mapred.reduce.max.attempts      4
> mapred.reduce.tasks     1
> mapred.reduce.tasks.speculative.execution       true
> mapred.tasktracker.reduce.tasks.maximum 2
> dfs.replication 2
> mapred.reduce.copy.backoff      300
>
> mapred.task.cache.levels        2
> mapred.max.tracker.failures     4
> mapred.map.tasks        4
> mapred.map.tasks.speculative.execution  true
> mapred.tasktracker.map.tasks.maximum    2
>
> Please help.
>