You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Karl Wettin <ka...@gmail.com> on 2008/04/16 21:07:07 UTC
aborting reducer
I have a job that out of a list with object finds the one with least
distance to a given test object. All my reducer does is to collect the
first result and ignore the rest.
> private boolean processed = false;
> public void reduce(DoubleWritable distance, Iterator<Long> keys,
> OutputCollector<DoubleWritable, LongWritable> output,
> Reporter reporter)
> throws IOException {
> if (processed) {
> return;
> }
> collector.collect(distance, keys.next());
> }
I'm not sure if I do something fundamentally wrong or designing the
mapper and the reducer or if I came up with a new use case, but it feels
very inefficient to iterate through all those records and deserialize
them just to ignore the value. Went looking in the code base to see if
it was possible to abort the reduction/combintion iteration and found
that a simple enough solution would be to throw some exception (or have
reduce return a boolean).
karl
Re: aborting reducer
Posted by Ted Dunning <td...@veoh.com>.
Would it be better to have lots of records arrive at the same reducer?
That has a simpler mechanism for ignoring data.
You can just add a (trivial) partition function in addition to your sort.
On 4/16/08 12:07 PM, "Karl Wettin" <ka...@gmail.com> wrote:
> I have a job that out of a list with object finds the one with least
> distance to a given test object. All my reducer does is to collect the
> first result and ignore the rest.
>
>> private boolean processed = false;
>> public void reduce(DoubleWritable distance, Iterator<Long> keys,
>> OutputCollector<DoubleWritable, LongWritable> output,
>> Reporter reporter)
>> throws IOException {
>> if (processed) {
>> return;
>> }
>> collector.collect(distance, keys.next());
>> }
>
> I'm not sure if I do something fundamentally wrong or designing the
> mapper and the reducer or if I came up with a new use case, but it feels
> very inefficient to iterate through all those records and deserialize
> them just to ignore the value. Went looking in the code base to see if
> it was possible to abort the reduction/combintion iteration and found
> that a simple enough solution would be to throw some exception (or have
> reduce return a boolean).
>
>
> karl