You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by ex...@nokia.com on 2012/02/16 18:49:49 UTC

Partitioners - How to know if they are working

Hello All,
I wrote my own partitioner and I would like to see if it's working.
By printing the return of method getPartition I could see that the partitions were different, but were they really working? To answer that I got the keys that every reducer task processed and that was what I expected. It seems my partitioner is working properly. But not easy to discover though.
Does anyone know if there is an easier way to see if your customized partitioner is working? For instance, a counter that shows how many partitioners a map generated or a reducer received?
Thanks in advance,
Fabio Almeida

RE: Partitioners - How to know if they are working

Posted by ex...@nokia.com.
Hello David,

I am following your tip! Thanks.

Also, I configured a small cluster with three datanodes and on my MR program I printed every single key that the reducers received. I set three reducers(setNumReduceTasks).

Analyzing the reducer outputs I could see that the keys were distributed as my partitioner ordered.

Of course, I had to make things much much smaller than real. I prepared an input, built a small cluster and so on .... to assure a minimal control. 

Not that I doubt hadoop, I doubt my code, always! :-)

Br,
Fabio Almeida 



-----Original Message-----
From: ext David Rosenstrauch [mailto:darose@darose.net] 
Sent: Friday, February 17, 2012 12:16 AM
To: mapreduce-user@hadoop.apache.org
Subject: Re: Partitioners - How to know if they are working

On 02/16/2012 12:49 PM, ext-fabio.almeida@nokia.com wrote:
> Hello All,
> I wrote my own partitioner and I would like to see if it's working.
> By printing the return of method getPartition I could see that the partitions were different, but were they really working? To answer that I got the keys that every reducer task processed and that was what I expected. It seems my partitioner is working properly. But not easy to discover though.
> Does anyone know if there is an easier way to see if your customized partitioner is working? For instance, a counter that shows how many partitioners a map generated or a reducer received?
> Thanks in advance,
> Fabio Almeida

At my last job we wrote a custom partitioner, and we tested it out completely outside of Hadoop using standard JUnit unit tests.

HTH,

DR


RE: Partitioners - How to know if they are working

Posted by ex...@nokia.com.
Hello David,

I am following your tip! Thanks.

Also, I configured a small cluster with three datanodes and on my MR program I printed every single key that the reducers received. I set three reducers(setNumReduceTasks).

Analyzing the reducer outputs I could see that the keys were distributed as my partitioner ordered.

Of course, I had to make things much much smaller than real. I prepared an input, built a small cluster and so on .... to assure a minimal control. 

Not that I doubt hadoop, I doubt my code, always! :-)

Br,
Fabio Almeida 



-----Original Message-----
From: ext David Rosenstrauch [mailto:darose@darose.net] 
Sent: Friday, February 17, 2012 12:16 AM
To: mapreduce-user@hadoop.apache.org
Subject: Re: Partitioners - How to know if they are working

On 02/16/2012 12:49 PM, ext-fabio.almeida@nokia.com wrote:
> Hello All,
> I wrote my own partitioner and I would like to see if it's working.
> By printing the return of method getPartition I could see that the partitions were different, but were they really working? To answer that I got the keys that every reducer task processed and that was what I expected. It seems my partitioner is working properly. But not easy to discover though.
> Does anyone know if there is an easier way to see if your customized partitioner is working? For instance, a counter that shows how many partitioners a map generated or a reducer received?
> Thanks in advance,
> Fabio Almeida

At my last job we wrote a custom partitioner, and we tested it out completely outside of Hadoop using standard JUnit unit tests.

HTH,

DR


Re: Partitioners - How to know if they are working

Posted by David Rosenstrauch <da...@darose.net>.
On 02/16/2012 12:49 PM, ext-fabio.almeida@nokia.com wrote:
> Hello All,
> I wrote my own partitioner and I would like to see if it's working.
> By printing the return of method getPartition I could see that the partitions were different, but were they really working? To answer that I got the keys that every reducer task processed and that was what I expected. It seems my partitioner is working properly. But not easy to discover though.
> Does anyone know if there is an easier way to see if your customized partitioner is working? For instance, a counter that shows how many partitioners a map generated or a reducer received?
> Thanks in advance,
> Fabio Almeida

At my last job we wrote a custom partitioner, and we tested it out 
completely outside of Hadoop using standard JUnit unit tests.

HTH,

DR


Re: Partitioners - How to know if they are working

Posted by Harsh J <ha...@cloudera.com>.
Hi Fabio,

There are test cases in the MapReduce project releases that test
setting a custom partitioner and ensuring it works as intended.

But if you still wish to assert/assure self, you should be able to add
a LOG statement to your custom Partitioner class's initialization
methods, that may indicate its being initialized - so that you can see
it on each map task's user logs.

There are other ways as well but essentially, there is no "fallback"
partitioner in case a user-specified partitioner is not initializable
- tasks would fail if you've misconfigured the partitioner.

For counters - there are no per-partition counters at the map end
(they could end up being too many depending on the number of reducers
you have for the job) but there are per-reduce-task input record
counters in each reduce task you can use to get the count of number of
keys that came into a specific partition.

For generally testing your MR code end to end, I recommend using the
Apache MRUnit library available at http://incubator.apache.org/mrunit/

On Thu, Feb 16, 2012 at 11:19 PM,  <ex...@nokia.com> wrote:
> Hello All,
>
> I wrote my own partitioner and I would like to see if it’s working.
>
> By printing the return of method getPartition I could see that the
> partitions were different, but were they really working? To answer that I
> got the keys that every reducer task processed and that was what I expected.
> It seems my partitioner is working properly. But not easy to discover
> though.
>
> Does anyone know if there is an easier way to see if your customized
> partitioner is working? For instance, a counter that shows how many
> partitioners a map generated or a reducer received?
>
> Thanks in advance,
>
> Fabio Almeida



-- 
Harsh J
Customer Ops. Engineer
Cloudera | http://tiny.cloudera.com/about

Re: Partitioners - How to know if they are working

Posted by Harsh J <ha...@cloudera.com>.
Hi Fabio,

There are test cases in the MapReduce project releases that test
setting a custom partitioner and ensuring it works as intended.

But if you still wish to assert/assure self, you should be able to add
a LOG statement to your custom Partitioner class's initialization
methods, that may indicate its being initialized - so that you can see
it on each map task's user logs.

There are other ways as well but essentially, there is no "fallback"
partitioner in case a user-specified partitioner is not initializable
- tasks would fail if you've misconfigured the partitioner.

For counters - there are no per-partition counters at the map end
(they could end up being too many depending on the number of reducers
you have for the job) but there are per-reduce-task input record
counters in each reduce task you can use to get the count of number of
keys that came into a specific partition.

For generally testing your MR code end to end, I recommend using the
Apache MRUnit library available at http://incubator.apache.org/mrunit/

On Thu, Feb 16, 2012 at 11:19 PM,  <ex...@nokia.com> wrote:
> Hello All,
>
> I wrote my own partitioner and I would like to see if it’s working.
>
> By printing the return of method getPartition I could see that the
> partitions were different, but were they really working? To answer that I
> got the keys that every reducer task processed and that was what I expected.
> It seems my partitioner is working properly. But not easy to discover
> though.
>
> Does anyone know if there is an easier way to see if your customized
> partitioner is working? For instance, a counter that shows how many
> partitioners a map generated or a reducer received?
>
> Thanks in advance,
>
> Fabio Almeida



-- 
Harsh J
Customer Ops. Engineer
Cloudera | http://tiny.cloudera.com/about