You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Ryan Farris <fa...@gmail.com> on 2009/04/27 15:44:51 UTC

.20.0, Partitioners?

Is there some magic to get a Partitioner working on .20.0?  Setting
the partitioner class on the Job object doesn't take, hadoop always
uses the HashPartitioner.  Looking through the source code, it looks
like the MapOutputBuffer in MapTask only ever fetches the
"mapred.partitioner.class", and doesn't check for new api's
"mapreduce.partitioner.class", but I'm not confident in my
understanding of how things work.

I was eventually able to get my test program working correctly by:
  1) Creating a partitioner that extends the deprecated
org.apache.hadoop.mapred.Partitioner class.
  2) Calling job.getConfiguration().set("mapred.partitioner.class",
DeprecatedTestPartitioner.class.getCanonicalName());
  3) Commenting out line 395 of org.apache.hadoop.mapreduce.Job.java,
where it asserts that "mapred.partitioner.class" is null

But I'm assuming editing the hadoop core sourcecode is not the
intended path.  Am I missing some simple switch or something?

rf

Re: .20.0, Partitioners?

Posted by Jothi Padmanabhan <jo...@yahoo-inc.com>.
I created 

https://issues.apache.org/jira/browse/HADOOP-5750

to follow this up. 

Thanks
Jothi


On 4/27/09 10:10 PM, "Jothi Padmanabhan" <jo...@yahoo-inc.com> wrote:

> Ryan,
> 
> I observed this behavior too -- Partitioner does not seems to work with the
> new API exactly for the reason you have mentioned. Till this gets fixed, you
> probably need to use the old API.
> 
> Jothi
> 
> 
> On 4/27/09 7:14 PM, "Ryan Farris" <fa...@gmail.com> wrote:
> 
>> Is there some magic to get a Partitioner working on .20.0?  Setting
>> the partitioner class on the Job object doesn't take, hadoop always
>> uses the HashPartitioner.  Looking through the source code, it looks
>> like the MapOutputBuffer in MapTask only ever fetches the
>> "mapred.partitioner.class", and doesn't check for new api's
>> "mapreduce.partitioner.class", but I'm not confident in my
>> understanding of how things work.
>> 
>> I was eventually able to get my test program working correctly by:
>>   1) Creating a partitioner that extends the deprecated
>> org.apache.hadoop.mapred.Partitioner class.
>>   2) Calling job.getConfiguration().set("mapred.partitioner.class",
>> DeprecatedTestPartitioner.class.getCanonicalName());
>>   3) Commenting out line 395 of org.apache.hadoop.mapreduce.Job.java,
>> where it asserts that "mapred.partitioner.class" is null
>> 
>> But I'm assuming editing the hadoop core sourcecode is not the
>> intended path.  Am I missing some simple switch or something?
>> 
>> rf
> 


Re: .20.0, Partitioners?

Posted by Jothi Padmanabhan <jo...@yahoo-inc.com>.
Ryan,

I observed this behavior too -- Partitioner does not seems to work with the
new API exactly for the reason you have mentioned. Till this gets fixed, you
probably need to use the old API.

Jothi


On 4/27/09 7:14 PM, "Ryan Farris" <fa...@gmail.com> wrote:

> Is there some magic to get a Partitioner working on .20.0?  Setting
> the partitioner class on the Job object doesn't take, hadoop always
> uses the HashPartitioner.  Looking through the source code, it looks
> like the MapOutputBuffer in MapTask only ever fetches the
> "mapred.partitioner.class", and doesn't check for new api's
> "mapreduce.partitioner.class", but I'm not confident in my
> understanding of how things work.
> 
> I was eventually able to get my test program working correctly by:
>   1) Creating a partitioner that extends the deprecated
> org.apache.hadoop.mapred.Partitioner class.
>   2) Calling job.getConfiguration().set("mapred.partitioner.class",
> DeprecatedTestPartitioner.class.getCanonicalName());
>   3) Commenting out line 395 of org.apache.hadoop.mapreduce.Job.java,
> where it asserts that "mapred.partitioner.class" is null
> 
> But I'm assuming editing the hadoop core sourcecode is not the
> intended path.  Am I missing some simple switch or something?
> 
> rf