You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by whitesky <wh...@gmail.com> on 2010/01/05 05:44:11 UTC

how to use InputSampler & TotalOrderPartitioner?

I want to use TotalOrderPartitioner to produce globally sorted results for
reducers. As I know, this partitioner needs a partition file which is
generated by input samplers. But it seems that all these samplers can only
sample input data.  Why doesn't samplers sample data from mappers' output? I
think that would be more useful.

I'm new to Hadoop, please correct me if I'm wrong.

Thanks in advance.
-- 
View this message in context: http://old.nabble.com/how-to-use-InputSampler---TotalOrderPartitioner--tp27023687p27023687.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.


Re: how to use InputSampler & TotalOrderPartitioner?

Posted by Jeff Zhang <zj...@gmail.com>.
Because the shuffle phase start as soon as any mapper task finish, and the
shuffle phase needs the Partitioner to route the output of mapper to
reducer. So the sampler must complete before the Shuffle phase start.

Jeff Zhang

On Tue, Jan 5, 2010 at 12:44 PM, whitesky <wh...@gmail.com> wrote:

>
> I want to use TotalOrderPartitioner to produce globally sorted results for
> reducers. As I know, this partitioner needs a partition file which is
> generated by input samplers. But it seems that all these samplers can only
> sample input data.  Why doesn't samplers sample data from mappers' output?
> I
> think that would be more useful.
>
> I'm new to Hadoop, please correct me if I'm wrong.
>
> Thanks in advance.
> --
> View this message in context:
> http://old.nabble.com/how-to-use-InputSampler---TotalOrderPartitioner--tp27023687p27023687.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>