You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Christian Ulrik Søttrup <so...@nbi.dk> on 2008/09/25 10:48:21 UTC

hdfs output for both mapper and reducer

Hi all,

I am interested in saving the the output of both the mapper and the 
reducer in HDFS, is there an efficient way of doing this?
Of course i could just run the mapper followed by the identity reducer, 
and then an identity mapper with my reducer. However,
it seems like a waste to run the framework twice. Is the sort between 
the mapper and reducer efficient if it recieves already sorted data?

cheers,
Christian

Re: hdfs output for both mapper and reducer

Posted by Mice <mi...@gmail.com>.
I think you can try org.apache.hadoop.mapred.lib.MultipleOutputs, it
will be released in 0.19 but you can apply the patch now.

Just my idea, not sure it's efficient or not

2008/9/25 Christian Ulrik Søttrup <so...@nbi.dk>:
> Hi all,
>
> I am interested in saving the the output of both the mapper and the reducer
> in HDFS, is there an efficient way of doing this?
> Of course i could just run the mapper followed by the identity reducer, and
> then an identity mapper with my reducer. However,
> it seems like a waste to run the framework twice. Is the sort between the
> mapper and reducer efficient if it recieves already sorted data?
>
> cheers,
> Christian
>