You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by gmar <gu...@quest.com> on 2010/05/12 08:06:06 UTC

Can a Partitioner access the Reporter?

I'd like to be able to have my customised Partitioner update counters in the
Reporter.
i.e. So that I know how many keys have been sent to each partition.

So, is it possible for the partitioner to obtain a reference to the
reporter?
I guess it'd need to obtain this via the JobConf object it has access to in
the configure() method.

Or is there another way to skin this cat?

tia
-- 
View this message in context: http://old.nabble.com/Can-a-Partitioner-access-the-Reporter--tp28532304p28532304.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.


Re: Can a Partitioner access the Reporter?

Posted by Owen O'Malley <om...@apache.org>.
On May 11, 2010, at 11:06 PM, gmar wrote:

>
> I'd like to be able to have my customised Partitioner update  
> counters in the
> Reporter.
> i.e. So that I know how many keys have been sent to each partition.
>
> So, is it possible for the partitioner to obtain a reference to the
> reporter?

No, even in the new API where we give access to the context within the  
close method, it isn't passed to the partitioner, unfortunately.

> I guess it'd need to obtain this via the JobConf object it has  
> access to in
> the configure() method.

You can the JobConf, but you can't get to the Reporter in the  
configure method.

> Or is there another way to skin this cat?

Roughly, you need to either set a static in the Mapper.map (in the new  
API use Mapper.setup) or emit a pseudo key or value with it. I'd lean  
toward a static...

-- Owen

Re: Can a Partitioner access the Reporter?

Posted by Eric Sammer <es...@cloudera.com>.
I don't believe there's any way to get a reference to a Reporter in
the Partitioner. Using JobConf to pass a complex object like the
Reporter isn't a good idea (and may not even work) because the
reporter can't be serialized and JobConf / Configuration do not take
arbitrary objects in their get() / set() methods.

Another way of looking at this is to say that if the partitioner runs
and divvies up the keys, you can simply count the keys at each reducer
and get the exact same effect. Whether you count them in the
partitioner or the reducer doesn't matter; they should be the same. Of
course, I could see that you might not want to repeat that counting
code in each reducer, but for now, that's the only real option.

It may be worth submitting a JIRA to make the reporter (or more likely
the Context in the new API) available in the partitioner.

Hope this helps.

On Wed, May 12, 2010 at 2:06 AM, gmar <gu...@quest.com> wrote:
>
> I'd like to be able to have my customised Partitioner update counters in the
> Reporter.
> i.e. So that I know how many keys have been sent to each partition.
>
> So, is it possible for the partitioner to obtain a reference to the
> reporter?
> I guess it'd need to obtain this via the JobConf object it has access to in
> the configure() method.
>
> Or is there another way to skin this cat?
>
> tia
> --
> View this message in context: http://old.nabble.com/Can-a-Partitioner-access-the-Reporter--tp28532304p28532304.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>



-- 
Eric Sammer
phone: +1-917-287-2675
twitter: esammer
data: www.cloudera.com