You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by exception <ex...@taomee.com> on 2011/01/30 11:06:52 UTC
sort the values in reduce side
Hi,
I am running a simple invert index generating program in hadoop which will emit every word in a text file as well as it's offsets.
So the output key is Text and output value is a list of LongWritable.
What I am trying to do is sort the offsets in reduce function. For each key, I put every value into a List and sort using Collections.sort().
This is the code sanp:
offsetList.clear();
for (LongWritable val : values)
{
offsetList.add(val);
}
Collections.sort(offsetList);
for (LongWritable offset : offsetList)
{
......
}
But it doesn't work. Looks like all the elements in offsetList have been overwritten by the smallest value in values. offsetList and values have the same size.
Can I sort the data in this way?
Thanks.
Re: sort the values in reduce side
Posted by Harsh J <qw...@gmail.com>.
The reduce's value iterator gives you a reference to a single object
that's utilized across the reduce calls. If you must build an entire
collection in memory to sort (You could explore how MapReduce itself
can help sort with comparators/groupers, which is more efficient), use
the clone() method of the value object to get a valid reference to
hold in a list.
On Sun, Jan 30, 2011 at 3:36 PM, exception <ex...@taomee.com> wrote:
> Hi,
>
>
>
> I am running a simple invert index generating program in hadoop which will
> emit every word in a text file as well as it’s offsets.
>
> So the output key is Text and output value is a list of LongWritable.
>
>
>
> What I am trying to do is sort the offsets in reduce function. For each key,
> I put every value into a List and sort using Collections.sort().
>
>
>
> This is the code sanp:
>
> offsetList.clear();
>
> for (LongWritable val : values)
>
> {
>
> offsetList.add(val);
>
> }
>
> Collections.sort(offsetList);
>
>
>
>
>
> for (LongWritable offset : offsetList)
>
> {
>
> ……
>
> }
>
>
>
> But it doesn’t work. Looks like all the elements in offsetList have been
> overwritten by the smallest value in values. offsetList and values have the
> same size.
>
> Can I sort the data in this way?
>
>
>
> Thanks.
--
Harsh J
www.harshj.com