You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Rob Stewart <ro...@googlemail.com> on 2010/02/18 16:39:09 UTC

Setting #Reducers at runtime

Hi there,

I am using Hadoop 0.20.1 and I am trying to submit jobs to the cluster as
jars.

Here's what I'm trying to do:

> hadoop jar $hadoopPath/hadoop-*-examples.jar join -Dmapred.reduce.tasks=10
-inFormat org.apache.hadoop.mapred.KeyValueTextInputFormat  -outKey
org.apache.hadoop.io.Text file1.dat file2.dat output.dat


However, my parameter setting of attempting to state 10 reducers for the job
is not being honoured by Hadoop, instead choosing some other number. Where
am I going wrong here? I do not want to have to change this value in
hadoop/conf.*.xml files, as I am attempting to show the expressive power of
Hadoop. Note: The power to specify the number or reducers in possible in
both Pig and Hive.


Thanks,


Rob Stewart

Re: Setting #Reducers at runtime

Posted by Rob Stewart <ro...@googlemail.com>.

OK, thanks for letting me know.

I'll make a tiny change to this code to allow reducers as a parameter, and
rerun my experiments.


Thanks Eric,

Rob



On 18 February 2010 16:37, E. Sammer <er...@lifeless.net> wrote:

> On 2/18/10 11:24 AM, Rob Stewart wrote:
>
>> Hi Eric, thanks.
>>
>> It appears not:
>> ----------------
>>  JobConf jobConf = new JobConf(getConf(), Sort.class);
>>     jobConf.setJobName("join");
>>
>>     jobConf.setMapperClass(IdentityMapper.class);
>>     jobConf.setReducerClass(IdentityReducer.class);
>>
>>     JobClient client = new JobClient(jobConf);
>>     ClusterStatus cluster = client.getClusterStatus();
>>     int num_maps = cluster.getTaskTrackers() *
>>                    jobConf.getInt("test.sort.maps_per_host", 10);
>>     int num_reduces = (int) (cluster.getMaxReduceTasks() * 0.9);
>>     String sort_reduces = jobConf.get("test.sort.reduces_per_host");
>>     if (sort_reduces != null) {
>>        num_reduces = cluster.getTaskTrackers() *
>>                        Integer.parseInt(sort_reduces);
>>     }
>>
>> jobConf.setNumReduceTasks(num_reduces);
>>
>> -----------
>>
>> Any idea why my parameter for reduce tasks is being ignored ?
>>
>
> Rob:
>
> It is setting the number of reducers itself. See the line:
>
> jobConf.setNumReduceTasks(num_reduces)
>
> In short, you can't control the number of reducers this code uses from the
> command line.
>
>
> --
> Eric Sammer
> eric@lifeless.net
> http://esammer.blogspot.com
>

Re: Setting #Reducers at runtime

Posted by "E. Sammer" <er...@lifeless.net>.

On 2/18/10 11:24 AM, Rob Stewart wrote:
> Hi Eric, thanks.
>
> It appears not:
> ----------------
>   JobConf jobConf = new JobConf(getConf(), Sort.class);
>      jobConf.setJobName("join");
>
>      jobConf.setMapperClass(IdentityMapper.class);
>      jobConf.setReducerClass(IdentityReducer.class);
>
>      JobClient client = new JobClient(jobConf);
>      ClusterStatus cluster = client.getClusterStatus();
>      int num_maps = cluster.getTaskTrackers() *
>                     jobConf.getInt("test.sort.maps_per_host", 10);
>      int num_reduces = (int) (cluster.getMaxReduceTasks() * 0.9);
>      String sort_reduces = jobConf.get("test.sort.reduces_per_host");
>      if (sort_reduces != null) {
>         num_reduces = cluster.getTaskTrackers() *
>                         Integer.parseInt(sort_reduces);
>      }
>
> jobConf.setNumReduceTasks(num_reduces);
>
> -----------
>
> Any idea why my parameter for reduce tasks is being ignored ?

Rob:

It is setting the number of reducers itself. See the line:

jobConf.setNumReduceTasks(num_reduces)

In short, you can't control the number of reducers this code uses from 
the command line.

-- 
Eric Sammer
eric@lifeless.net
http://esammer.blogspot.com

Re: Setting #Reducers at runtime

Posted by Rob Stewart <ro...@googlemail.com>.

Hi Eric, thanks.

It appears not:
----------------
 JobConf jobConf = new JobConf(getConf(), Sort.class);
    jobConf.setJobName("join");

    jobConf.setMapperClass(IdentityMapper.class);
    jobConf.setReducerClass(IdentityReducer.class);

    JobClient client = new JobClient(jobConf);
    ClusterStatus cluster = client.getClusterStatus();
    int num_maps = cluster.getTaskTrackers() *
                   jobConf.getInt("test.sort.maps_per_host", 10);
    int num_reduces = (int) (cluster.getMaxReduceTasks() * 0.9);
    String sort_reduces = jobConf.get("test.sort.reduces_per_host");
    if (sort_reduces != null) {
       num_reduces = cluster.getTaskTrackers() *
                       Integer.parseInt(sort_reduces);
    }

jobConf.setNumReduceTasks(num_reduces);

-----------

Any idea why my parameter for reduce tasks is being ignored ?

thanks,

Rob


On 18 February 2010 16:08, E. Sammer <er...@lifeless.net> wrote:

> On 2/18/10 10:39 AM, Rob Stewart wrote:
>
>> Hi there,
>>
>> I am using Hadoop 0.20.1 and I am trying to submit jobs to the cluster
>> as jars.
>>
>> Here's what I'm trying to do:
>>
>>  > hadoop jar $hadoopPath/hadoop-*-examples.jar join
>> -Dmapred.reduce.tasks=10 -inFormat
>> org.apache.hadoop.mapred.KeyValueTextInputFormat  -outKey
>> org.apache.hadoop.io.Text file1.dat file2.dat output.dat
>>
>>
>> However, my parameter setting of attempting to state 10 reducers for the
>> job is not being honoured by Hadoop, instead choosing some other number.
>> Where am I going wrong here? I do not want to have to change this value
>> in hadoop/conf.*.xml files, as I am attempting to show the expressive
>> power of Hadoop. Note: The power to specify the number or reducers in
>> possible in both Pig and Hive.
>>
>>
>> Thanks,
>>
>>
>> Rob Stewart
>>
>
> Rob:
>
> It's possible that something inside the jar is calling
> JobConf.setNumReducers(x) after it parses the command line args. That would
> cause this type of behavior. I haven't looked at the source for the join
> example to confirm this, though.
>
> Regards.
> --
> Eric Sammer
> eric@lifeless.net
> http://esammer.blogspot.com
>

Re: Setting #Reducers at runtime

Posted by "E. Sammer" <er...@lifeless.net>.

On 2/18/10 10:39 AM, Rob Stewart wrote:
> Hi there,
>
> I am using Hadoop 0.20.1 and I am trying to submit jobs to the cluster
> as jars.
>
> Here's what I'm trying to do:
>
>  > hadoop jar $hadoopPath/hadoop-*-examples.jar join
> -Dmapred.reduce.tasks=10 -inFormat
> org.apache.hadoop.mapred.KeyValueTextInputFormat  -outKey
> org.apache.hadoop.io.Text file1.dat file2.dat output.dat
>
>
> However, my parameter setting of attempting to state 10 reducers for the
> job is not being honoured by Hadoop, instead choosing some other number.
> Where am I going wrong here? I do not want to have to change this value
> in hadoop/conf.*.xml files, as I am attempting to show the expressive
> power of Hadoop. Note: The power to specify the number or reducers in
> possible in both Pig and Hive.
>
>
> Thanks,
>
>
> Rob Stewart

Rob:

It's possible that something inside the jar is calling 
JobConf.setNumReducers(x) after it parses the command line args. That 
would cause this type of behavior. I haven't looked at the source for 
the join example to confirm this, though.

Regards.
-- 
Eric Sammer
eric@lifeless.net
http://esammer.blogspot.com