You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Richard G <gl...@hotmail.com> on 2009/09/12 22:39:25 UTC

How to retrieve the reducer output file names?

Hi,

For my application, I need to retrieve the output file name for each
reducer. But is there any convenient way to do that? I also want to know
which file is coming from which reducer. So simple enumeration in output
directory doesn't work for me.

Thank you!
-- 
View this message in context: http://www.nabble.com/How-to-retrieve-the-reducer-output-file-names--tp25418039p25418039.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.


Re: How to retrieve the reducer output file names?

Posted by Amandeep Khurana <am...@gmail.com>.
I'm not sure if that is possible. Would rather wait for someone to give a
more concrete answer.

However, I'm curious - what kind of application is it and why would you need
to know which reducer is giving out which file? Also, have you thought about
any other way of designing the application so that it can run independent of
this information?


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Sat, Sep 12, 2009 at 1:39 PM, Richard G <gl...@hotmail.com> wrote:

>
> Hi,
>
> For my application, I need to retrieve the output file name for each
> reducer. But is there any convenient way to do that? I also want to know
> which file is coming from which reducer. So simple enumeration in output
> directory doesn't work for me.
>
> Thank you!
> --
> View this message in context:
> http://www.nabble.com/How-to-retrieve-the-reducer-output-file-names--tp25418039p25418039.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>

Re: How to retrieve the reducer output file names?

Posted by Tarandeep Singh <ta...@gmail.com>.
The output of mappers is partitioned, each partition is given a number
starting from 0 and a reducer works on one of these partitions. In the
configure method of your reducer code, you can get the partition number by-
jobConf.getInt( "mapred.task.partition", 0);

If you use the default output format, then the reducer working on partition
0 will output part-00000, reducer working on partition 1 will output
part-00001 etc.

You can extend TextOutputFormat or SequenceFileOutputFormat (depending upon
which output format you are using) and change the file name from part-xxxxx
to some one else.

Hope this helps,
Tarandeep


On Sat, Sep 12, 2009 at 1:39 PM, Richard G <gl...@hotmail.com> wrote:

>
> Hi,
>
> For my application, I need to retrieve the output file name for each
> reducer. But is there any convenient way to do that? I also want to know
> which file is coming from which reducer. So simple enumeration in output
> directory doesn't work for me.
>
> Thank you!
> --
> View this message in context:
> http://www.nabble.com/How-to-retrieve-the-reducer-output-file-names--tp25418039p25418039.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>