You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Rob Stewart <ro...@googlemail.com> on 2010/02/18 16:39:09 UTC
Setting #Reducers at runtime
Hi there,
I am using Hadoop 0.20.1 and I am trying to submit jobs to the cluster as
jars.
Here's what I'm trying to do:
> hadoop jar $hadoopPath/hadoop-*-examples.jar join -Dmapred.reduce.tasks=10
-inFormat org.apache.hadoop.mapred.KeyValueTextInputFormat -outKey
org.apache.hadoop.io.Text file1.dat file2.dat output.dat
However, my parameter setting of attempting to state 10 reducers for the job
is not being honoured by Hadoop, instead choosing some other number. Where
am I going wrong here? I do not want to have to change this value in
hadoop/conf.*.xml files, as I am attempting to show the expressive power of
Hadoop. Note: The power to specify the number or reducers in possible in
both Pig and Hive.
Thanks,
Rob Stewart
Re: Setting #Reducers at runtime
Posted by Rob Stewart <ro...@googlemail.com>.
OK, thanks for letting me know.
I'll make a tiny change to this code to allow reducers as a parameter, and
rerun my experiments.
Thanks Eric,
Rob
On 18 February 2010 16:37, E. Sammer <er...@lifeless.net> wrote:
> On 2/18/10 11:24 AM, Rob Stewart wrote:
>
>> Hi Eric, thanks.
>>
>> It appears not:
>> ----------------
>> JobConf jobConf = new JobConf(getConf(), Sort.class);
>> jobConf.setJobName("join");
>>
>> jobConf.setMapperClass(IdentityMapper.class);
>> jobConf.setReducerClass(IdentityReducer.class);
>>
>> JobClient client = new JobClient(jobConf);
>> ClusterStatus cluster = client.getClusterStatus();
>> int num_maps = cluster.getTaskTrackers() *
>> jobConf.getInt("test.sort.maps_per_host", 10);
>> int num_reduces = (int) (cluster.getMaxReduceTasks() * 0.9);
>> String sort_reduces = jobConf.get("test.sort.reduces_per_host");
>> if (sort_reduces != null) {
>> num_reduces = cluster.getTaskTrackers() *
>> Integer.parseInt(sort_reduces);
>> }
>>
>> jobConf.setNumReduceTasks(num_reduces);
>>
>> -----------
>>
>> Any idea why my parameter for reduce tasks is being ignored ?
>>
>
> Rob:
>
> It is setting the number of reducers itself. See the line:
>
> jobConf.setNumReduceTasks(num_reduces)
>
> In short, you can't control the number of reducers this code uses from the
> command line.
>
>
> --
> Eric Sammer
> eric@lifeless.net
> http://esammer.blogspot.com
>
Re: Setting #Reducers at runtime
Posted by "E. Sammer" <er...@lifeless.net>.
On 2/18/10 11:24 AM, Rob Stewart wrote:
> Hi Eric, thanks.
>
> It appears not:
> ----------------
> JobConf jobConf = new JobConf(getConf(), Sort.class);
> jobConf.setJobName("join");
>
> jobConf.setMapperClass(IdentityMapper.class);
> jobConf.setReducerClass(IdentityReducer.class);
>
> JobClient client = new JobClient(jobConf);
> ClusterStatus cluster = client.getClusterStatus();
> int num_maps = cluster.getTaskTrackers() *
> jobConf.getInt("test.sort.maps_per_host", 10);
> int num_reduces = (int) (cluster.getMaxReduceTasks() * 0.9);
> String sort_reduces = jobConf.get("test.sort.reduces_per_host");
> if (sort_reduces != null) {
> num_reduces = cluster.getTaskTrackers() *
> Integer.parseInt(sort_reduces);
> }
>
> jobConf.setNumReduceTasks(num_reduces);
>
> -----------
>
> Any idea why my parameter for reduce tasks is being ignored ?
Rob:
It is setting the number of reducers itself. See the line:
jobConf.setNumReduceTasks(num_reduces)
In short, you can't control the number of reducers this code uses from
the command line.
--
Eric Sammer
eric@lifeless.net
http://esammer.blogspot.com
Re: Setting #Reducers at runtime
Posted by Rob Stewart <ro...@googlemail.com>.
Hi Eric, thanks.
It appears not:
----------------
JobConf jobConf = new JobConf(getConf(), Sort.class);
jobConf.setJobName("join");
jobConf.setMapperClass(IdentityMapper.class);
jobConf.setReducerClass(IdentityReducer.class);
JobClient client = new JobClient(jobConf);
ClusterStatus cluster = client.getClusterStatus();
int num_maps = cluster.getTaskTrackers() *
jobConf.getInt("test.sort.maps_per_host", 10);
int num_reduces = (int) (cluster.getMaxReduceTasks() * 0.9);
String sort_reduces = jobConf.get("test.sort.reduces_per_host");
if (sort_reduces != null) {
num_reduces = cluster.getTaskTrackers() *
Integer.parseInt(sort_reduces);
}
jobConf.setNumReduceTasks(num_reduces);
-----------
Any idea why my parameter for reduce tasks is being ignored ?
thanks,
Rob
On 18 February 2010 16:08, E. Sammer <er...@lifeless.net> wrote:
> On 2/18/10 10:39 AM, Rob Stewart wrote:
>
>> Hi there,
>>
>> I am using Hadoop 0.20.1 and I am trying to submit jobs to the cluster
>> as jars.
>>
>> Here's what I'm trying to do:
>>
>> > hadoop jar $hadoopPath/hadoop-*-examples.jar join
>> -Dmapred.reduce.tasks=10 -inFormat
>> org.apache.hadoop.mapred.KeyValueTextInputFormat -outKey
>> org.apache.hadoop.io.Text file1.dat file2.dat output.dat
>>
>>
>> However, my parameter setting of attempting to state 10 reducers for the
>> job is not being honoured by Hadoop, instead choosing some other number.
>> Where am I going wrong here? I do not want to have to change this value
>> in hadoop/conf.*.xml files, as I am attempting to show the expressive
>> power of Hadoop. Note: The power to specify the number or reducers in
>> possible in both Pig and Hive.
>>
>>
>> Thanks,
>>
>>
>> Rob Stewart
>>
>
> Rob:
>
> It's possible that something inside the jar is calling
> JobConf.setNumReducers(x) after it parses the command line args. That would
> cause this type of behavior. I haven't looked at the source for the join
> example to confirm this, though.
>
> Regards.
> --
> Eric Sammer
> eric@lifeless.net
> http://esammer.blogspot.com
>
Re: Setting #Reducers at runtime
Posted by "E. Sammer" <er...@lifeless.net>.
On 2/18/10 10:39 AM, Rob Stewart wrote:
> Hi there,
>
> I am using Hadoop 0.20.1 and I am trying to submit jobs to the cluster
> as jars.
>
> Here's what I'm trying to do:
>
> > hadoop jar $hadoopPath/hadoop-*-examples.jar join
> -Dmapred.reduce.tasks=10 -inFormat
> org.apache.hadoop.mapred.KeyValueTextInputFormat -outKey
> org.apache.hadoop.io.Text file1.dat file2.dat output.dat
>
>
> However, my parameter setting of attempting to state 10 reducers for the
> job is not being honoured by Hadoop, instead choosing some other number.
> Where am I going wrong here? I do not want to have to change this value
> in hadoop/conf.*.xml files, as I am attempting to show the expressive
> power of Hadoop. Note: The power to specify the number or reducers in
> possible in both Pig and Hive.
>
>
> Thanks,
>
>
> Rob Stewart
Rob:
It's possible that something inside the jar is calling
JobConf.setNumReducers(x) after it parses the command line args. That
would cause this type of behavior. I haven't looked at the source for
the join example to confirm this, though.
Regards.
--
Eric Sammer
eric@lifeless.net
http://esammer.blogspot.com