You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by David Rosenstrauch <da...@darose.net> on 2010/08/04 17:38:13 UTC

Fwd: Partitioner in Hadoop 0.20

Someone sent this email to the commons-user list a while back, but it 
seems like it slipped through the cracks.  We're starting to dig into 
some hard-core Hadoop development and just came upon this same issue, 
though.

Anyone know if there's any particular reason why the new Partitioner 
class doesn't implement JobConfigurable?  (And, if not, whether there's 
any plans to fix this omission?)  We're working on a somewhat complex 
partitioner, and it would be extremely helpful to be able to pass it 
some parms via the jobconf.

Thanks,

DR

-------- Original Message --------
Subject: Partitioner in Hadoop 0.20
Date: Wed, 30 Jun 2010 00:05:52 -0400
From: Saptarshi Guha <sa...@gmail.com>
Reply-To: common-user@hadoop.apache.org,	saptarshi.guha@gmail.com
To: common-user@hadoop.apache.org

Hello,

in hadoop 0.20.2 (current), the Partitioner class does not extend
JobConfigurable (as it did in Hadoop pre 0.19).
Does this mean there isn't a way to set some configurable options for the
partitioner?

http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/Partitioner.html

(old :
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/Partitioner.html
)

Regards
Saptarshi


Re: Partitioner in Hadoop 0.20

Posted by Owen O'Malley <om...@apache.org>.
On Aug 4, 2010, at 10:58 AM, David Rosenstrauch wrote:

> So my partitioner needs to implement Configurable, then not  
> JobConfigurable.  Tnx much!

ReflectionUtils.newInstance will use either Configurable or  
JobConfigurable (or both!). So implementing either one will work fine.

-- Owen

Re: Partitioner in Hadoop 0.20

Posted by Owen O'Malley <om...@apache.org>.
On Aug 4, 2010, at 10:58 AM, David Rosenstrauch wrote:

> So my partitioner needs to implement Configurable, then not  
> JobConfigurable.  Tnx much!

ReflectionUtils.newInstance will use either Configurable or  
JobConfigurable (or both!). So implementing either one will work fine.

-- Owen

Re: Partitioner in Hadoop 0.20

Posted by David Rosenstrauch <da...@darose.net>.
On 08/04/2010 01:55 PM, Wilkes, Chris wrote:
> On Aug 4, 2010, at 10:50 AM, David Rosenstrauch wrote:
>
>> On 08/04/2010 12:30 PM, Owen O'Malley wrote:
>>>
>>> On Aug 4, 2010, at 8:38 AM, David Rosenstrauch wrote:
>>>
>>>> Anyone know if there's any particular reason why the new Partitioner
>>>> class doesn't implement JobConfigurable? (And, if not, whether there's
>>>> any plans to fix this omission?) We're working on a somewhat complex
>>>> partitioner, and it would be extremely helpful to be able to pass it
>>>> some parms via the jobconf.
>>>
>>> The short answer is that it doesn't need to. If you make your
>>> partitioner either Configured or JobConfigurable, it will be configured.
>>> The API class doesn't depend on it precisely because it is not required
>>> for all partitioners.
>>>
>>> -- Owen
>>
>> ? Not sure I understand correctly ... can you pls clarify?
>>
>> So if I make my custom partitioner implement JobConfigurable, then its
>> configure(JobConf) method will automagically get called and allow me
>> to configure it with info in the jobConf that's passed in? (Note that
>> making it extend from Configured is not an option, since it needs to
>> extend from org.apache.hadoop.mapreduce.Partitioner.)
>>
>
>
> The partitioner is instantiated by ReflectionUtils.newInstance(clazz,
> job) , that calls the setConfiguration() on the newly created object if
> it implements Configurable
>
> Chris

So my partitioner needs to implement Configurable, then not 
JobConfigurable.  Tnx much!

DR

Re: Partitioner in Hadoop 0.20

Posted by David Rosenstrauch <da...@darose.net>.
On 08/04/2010 01:55 PM, Wilkes, Chris wrote:
> On Aug 4, 2010, at 10:50 AM, David Rosenstrauch wrote:
>
>> On 08/04/2010 12:30 PM, Owen O'Malley wrote:
>>>
>>> On Aug 4, 2010, at 8:38 AM, David Rosenstrauch wrote:
>>>
>>>> Anyone know if there's any particular reason why the new Partitioner
>>>> class doesn't implement JobConfigurable? (And, if not, whether there's
>>>> any plans to fix this omission?) We're working on a somewhat complex
>>>> partitioner, and it would be extremely helpful to be able to pass it
>>>> some parms via the jobconf.
>>>
>>> The short answer is that it doesn't need to. If you make your
>>> partitioner either Configured or JobConfigurable, it will be configured.
>>> The API class doesn't depend on it precisely because it is not required
>>> for all partitioners.
>>>
>>> -- Owen
>>
>> ? Not sure I understand correctly ... can you pls clarify?
>>
>> So if I make my custom partitioner implement JobConfigurable, then its
>> configure(JobConf) method will automagically get called and allow me
>> to configure it with info in the jobConf that's passed in? (Note that
>> making it extend from Configured is not an option, since it needs to
>> extend from org.apache.hadoop.mapreduce.Partitioner.)
>>
>
>
> The partitioner is instantiated by ReflectionUtils.newInstance(clazz,
> job) , that calls the setConfiguration() on the newly created object if
> it implements Configurable
>
> Chris

So my partitioner needs to implement Configurable, then not 
JobConfigurable.  Tnx much!

DR

Re: Partitioner in Hadoop 0.20

Posted by "Wilkes, Chris" <cw...@gmail.com>.
On Aug 4, 2010, at 10:50 AM, David Rosenstrauch wrote:

> On 08/04/2010 12:30 PM, Owen O'Malley wrote:
>>
>> On Aug 4, 2010, at 8:38 AM, David Rosenstrauch wrote:
>>
>>> Anyone know if there's any particular reason why the new Partitioner
>>> class doesn't implement JobConfigurable? (And, if not, whether  
>>> there's
>>> any plans to fix this omission?) We're working on a somewhat complex
>>> partitioner, and it would be extremely helpful to be able to pass it
>>> some parms via the jobconf.
>>
>> The short answer is that it doesn't need to. If you make your
>> partitioner either Configured or JobConfigurable, it will be  
>> configured.
>> The API class doesn't depend on it precisely because it is not  
>> required
>> for all partitioners.
>>
>> -- Owen
>
> ?  Not sure I understand correctly ... can you pls clarify?
>
> So if I make my custom partitioner implement JobConfigurable, then  
> its configure(JobConf) method will automagically get called and  
> allow me to configure it with info in the jobConf that's passed in?   
> (Note that making it extend from Configured is not an option, since  
> it needs to extend from org.apache.hadoop.mapreduce.Partitioner.)
>


The partitioner is instantiated by ReflectionUtils.newInstance(clazz,  
job) , that calls the setConfiguration() on the newly created object  
if it implements Configurable

Chris

Re: Partitioner in Hadoop 0.20

Posted by David Rosenstrauch <da...@darose.net>.
On 08/04/2010 12:30 PM, Owen O'Malley wrote:
>
> On Aug 4, 2010, at 8:38 AM, David Rosenstrauch wrote:
>
>> Anyone know if there's any particular reason why the new Partitioner
>> class doesn't implement JobConfigurable? (And, if not, whether there's
>> any plans to fix this omission?) We're working on a somewhat complex
>> partitioner, and it would be extremely helpful to be able to pass it
>> some parms via the jobconf.
>
> The short answer is that it doesn't need to. If you make your
> partitioner either Configured or JobConfigurable, it will be configured.
> The API class doesn't depend on it precisely because it is not required
> for all partitioners.
>
> -- Owen

?  Not sure I understand correctly ... can you pls clarify?

So if I make my custom partitioner implement JobConfigurable, then its 
configure(JobConf) method will automagically get called and allow me to 
configure it with info in the jobConf that's passed in?  (Note that 
making it extend from Configured is not an option, since it needs to 
extend from org.apache.hadoop.mapreduce.Partitioner.)

Thanks,

DR

Re: Partitioner in Hadoop 0.20

Posted by David Rosenstrauch <da...@darose.net>.
On 08/04/2010 12:30 PM, Owen O'Malley wrote:
>
> On Aug 4, 2010, at 8:38 AM, David Rosenstrauch wrote:
>
>> Anyone know if there's any particular reason why the new Partitioner
>> class doesn't implement JobConfigurable? (And, if not, whether there's
>> any plans to fix this omission?) We're working on a somewhat complex
>> partitioner, and it would be extremely helpful to be able to pass it
>> some parms via the jobconf.
>
> The short answer is that it doesn't need to. If you make your
> partitioner either Configured or JobConfigurable, it will be configured.
> The API class doesn't depend on it precisely because it is not required
> for all partitioners.
>
> -- Owen

?  Not sure I understand correctly ... can you pls clarify?

So if I make my custom partitioner implement JobConfigurable, then its 
configure(JobConf) method will automagically get called and allow me to 
configure it with info in the jobConf that's passed in?  (Note that 
making it extend from Configured is not an option, since it needs to 
extend from org.apache.hadoop.mapreduce.Partitioner.)

Thanks,

DR

Re: Partitioner in Hadoop 0.20

Posted by Owen O'Malley <om...@apache.org>.
On Aug 4, 2010, at 8:38 AM, David Rosenstrauch wrote:

> Anyone know if there's any particular reason why the new Partitioner  
> class doesn't implement JobConfigurable?  (And, if not, whether  
> there's any plans to fix this omission?)  We're working on a  
> somewhat complex partitioner, and it would be extremely helpful to  
> be able to pass it some parms via the jobconf.

The short answer is that it doesn't need to. If you make your  
partitioner either Configured or JobConfigurable, it will be  
configured. The API class doesn't depend on it precisely because it is  
not required for all partitioners.

-- Owen

Re: Partitioner in Hadoop 0.20

Posted by Owen O'Malley <om...@apache.org>.
On Aug 4, 2010, at 8:38 AM, David Rosenstrauch wrote:

> Anyone know if there's any particular reason why the new Partitioner  
> class doesn't implement JobConfigurable?  (And, if not, whether  
> there's any plans to fix this omission?)  We're working on a  
> somewhat complex partitioner, and it would be extremely helpful to  
> be able to pass it some parms via the jobconf.

The short answer is that it doesn't need to. If you make your  
partitioner either Configured or JobConfigurable, it will be  
configured. The API class doesn't depend on it precisely because it is  
not required for all partitioners.

-- Owen