You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by exception <ex...@taomee.com> on 2011/01/30 11:06:52 UTC

sort the values in reduce side

Hi,

I am running a simple invert index generating program in hadoop which will emit every word in a text file as well as it's offsets.
So the output key is Text and output value is a list of LongWritable.

What I am trying to do is sort the offsets in reduce function. For each key, I put every value into a List and sort using Collections.sort().

This is the code sanp:
offsetList.clear();
            for (LongWritable val : values)
            {
                offsetList.add(val);
            }
            Collections.sort(offsetList);


            for (LongWritable offset : offsetList)
                            {
                                     ......
}

But it doesn't work. Looks like all the elements in offsetList have been overwritten by the smallest value in values. offsetList and values have the same size.
Can I sort the data in this way?

Thanks.

Re: sort the values in reduce side

Posted by Harsh J <qw...@gmail.com>.
The reduce's value iterator gives you a reference to a single object
that's utilized across the reduce calls. If you must build an entire
collection in memory to sort (You could explore how MapReduce itself
can help sort with comparators/groupers, which is more efficient), use
the clone() method of the value object to get a valid reference to
hold in a list.

On Sun, Jan 30, 2011 at 3:36 PM, exception <ex...@taomee.com> wrote:
> Hi,
>
>
>
> I am running a simple invert index generating program in hadoop which will
> emit every word in a text file as well as it’s offsets.
>
> So the output key is Text and output value is a list of LongWritable.
>
>
>
> What I am trying to do is sort the offsets in reduce function. For each key,
> I put every value into a List and sort using Collections.sort().
>
>
>
> This is the code sanp:
>
> offsetList.clear();
>
>             for (LongWritable val : values)
>
>             {
>
>                 offsetList.add(val);
>
>             }
>
>             Collections.sort(offsetList);
>
>
>
>
>
>             for (LongWritable offset : offsetList)
>
>                             {
>
>                                      ……
>
> }
>
>
>
> But it doesn’t work. Looks like all the elements in offsetList have been
> overwritten by the smallest value in values. offsetList and values have the
> same size.
>
> Can I sort the data in this way?
>
>
>
> Thanks.



-- 
Harsh J
www.harshj.com