You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by ksgupta misc <ks...@gmail.com> on 2011/09/14 02:44:21 UTC

How to sort key,value pair by value(In ascending)

Hi,

I have the content like
*10103*,1042279,*4*
*10070*,1001089,*5*
*10102*,1015504,*7*
*10080*,1024369,*7*
*10050*,1025671,*1*
...
from which i separated the key,value pairs and got the output after a single
map and reduce as follows:

10050  1
10070  5
10080  7
10102  7
10103  4
...

I require to sort the output<key,value> pair by value (In ascending order).
Please let me know how can i go ahead.

Required output:
10050  1
10103  4
10070  5
10080  7
10102  7

Thanks in advance,
--Shashi

Re: How to sort key,value pair by value(In ascending)

Posted by Bejoy KS <be...@gmail.com>.

Shashi
      Here you'd definitely need a set of map reduce process to do the
aggregation of values on the reducer. Now for sorting the output in very
simple terms use another set of map reduce where the map output key would be
the value of the first Map Reduce output and the map output value would be
the the key of the first MapReduce output. One more map reduce process is
certainly expensive.
You can watch out the post as the experts would comment if there are better
solutions to your problem.

Regards
Bejoy.K.S



On Wed, Sep 14, 2011 at 12:04 PM, ksgupta misc <ks...@gmail.com>wrote:

> Hi Guys,
>  Thanku for your valuable suggestion.
> I see this works fine in cases were key values are unique.
>
> In my use cases the values are as follows:
> *<bookid>,<eid>,<rating>*
> 0000012742,3244,1
> 0028604164,2344,3
> 0062059017,2344,5
> 0075546701,2344,1
> 0130213268,2344,8
> 0140105425,5675,3
> 0141304286,5677,6
> 0195052668,3453,8
> 0198775024,2342,9
> 0000012742,2346,2
> 0028604164,9789,4
> 0062059017,2346,3
> 0075546701,2345,2
> 0130213268,8907,4
> 0140105425,5675,5
> 0141304286,3457,6
> 0195052668,5678,7
> 0198775024,8975,8
> 0000012742,6798,3
> 0028604164,5434,7
> 0062059017,9754,4
> 0075546701,7890,6
> 0130213268,7655,7
> 0140105425,7564,8
> 0141304286,8433,3
> 0195052668,3252,6
> 0198775024,7765,7
>
> My goal here to right a program which will output the books id's sorted (
> ascending) by the average of rating.
> I am done till the following steps:
> 1. Map : create pairs key, value and context.write(key,value)
> 2. Reducer: For each key    sum of ratings/no of book entries.
> context(key,avg_rating)
>
> Example output will be like:
> 0075546701,4.6v
> 0062059017,2.1
> 0195052668,6.1
> 0198775024,2.7
>
> My next step is to sort the books ids based on (ascending) order of the
> average rating.
> How to write the program for getting the example output as follows:
>
> 0062059017,2.1
> 0198775024,2.7
> 0075546701,4.6
> 0195052668,6.1
>
>
> Please let me know if my approach is wrong  as i am new to hadoop.
>
> Thanks in advance,
> --Shashi.
>
>
>
>
>
> On Wed, Sep 14, 2011 at 11:32 AM, Sudharsan Sampath <su...@gmail.com>wrote:
>
>> One way is to reverse the  <key,value> output in the mapper to emit<1,
>> 10050> and in the reducer, use a treeset to order ur values.. for each value
>> o/p <value, key> in the reducer.
>>
>> With this O/P will be sorted as per ur needs within each reducer. If u
>> need a total sorted o/p, u can use a single reducer or design ur partition
>> logic accordingly.
>>
>> Thanks
>> Sudhan S
>>
>>
>> On Wed, Sep 14, 2011 at 6:14 AM, ksgupta misc <ks...@gmail.com>wrote:
>>
>>> Hi,
>>>
>>> I have the content like
>>> *10103*,1042279,*4*
>>> *10070*,1001089,*5*
>>> *10102*,1015504,*7*
>>> *10080*,1024369,*7*
>>> *10050*,1025671,*1*
>>> ...
>>> from which i separated the key,value pairs and got the output after a
>>> single map and reduce as follows:
>>>
>>> 10050  1
>>> 10070  5
>>> 10080  7
>>> 10102  7
>>> 10103  4
>>> ...
>>>
>>> I require to sort the output<key,value> pair by value (In ascending
>>> order).
>>> Please let me know how can i go ahead.
>>>
>>> Required output:
>>> 10050  1
>>> 10103  4
>>> 10070  5
>>> 10080  7
>>> 10102  7
>>>
>>> Thanks in advance,
>>> --Shashi
>>>
>>>
>>>
>>>
>>>
>>
>

Re: How to sort key,value pair by value(In ascending)

Posted by ksgupta misc <ks...@gmail.com>.

Hi Guys,
 Thanku for your valuable suggestion.
I see this works fine in cases were key values are unique.

In my use cases the values are as follows:
*<bookid>,<eid>,<rating>*
0000012742,3244,1
0028604164,2344,3
0062059017,2344,5
0075546701,2344,1
0130213268,2344,8
0140105425,5675,3
0141304286,5677,6
0195052668,3453,8
0198775024,2342,9
0000012742,2346,2
0028604164,9789,4
0062059017,2346,3
0075546701,2345,2
0130213268,8907,4
0140105425,5675,5
0141304286,3457,6
0195052668,5678,7
0198775024,8975,8
0000012742,6798,3
0028604164,5434,7
0062059017,9754,4
0075546701,7890,6
0130213268,7655,7
0140105425,7564,8
0141304286,8433,3
0195052668,3252,6
0198775024,7765,7

My goal here to right a program which will output the books id's sorted (
ascending) by the average of rating.
I am done till the following steps:
1. Map : create pairs key, value and context.write(key,value)
2. Reducer: For each key    sum of ratings/no of book entries.
context(key,avg_rating)

Example output will be like:
0075546701,4.6v
0062059017,2.1
0195052668,6.1
0198775024,2.7

My next step is to sort the books ids based on (ascending) order of the
average rating.
How to write the program for getting the example output as follows:

0062059017,2.1
0198775024,2.7
0075546701,4.6
0195052668,6.1


Please let me know if my approach is wrong  as i am new to hadoop.

Thanks in advance,
--Shashi.




On Wed, Sep 14, 2011 at 11:32 AM, Sudharsan Sampath <su...@gmail.com>wrote:

> One way is to reverse the  <key,value> output in the mapper to emit<1,
> 10050> and in the reducer, use a treeset to order ur values.. for each value
> o/p <value, key> in the reducer.
>
> With this O/P will be sorted as per ur needs within each reducer. If u need
> a total sorted o/p, u can use a single reducer or design ur partition logic
> accordingly.
>
> Thanks
> Sudhan S
>
>
> On Wed, Sep 14, 2011 at 6:14 AM, ksgupta misc <ks...@gmail.com>wrote:
>
>> Hi,
>>
>> I have the content like
>> *10103*,1042279,*4*
>> *10070*,1001089,*5*
>> *10102*,1015504,*7*
>> *10080*,1024369,*7*
>> *10050*,1025671,*1*
>> ...
>> from which i separated the key,value pairs and got the output after a
>> single map and reduce as follows:
>>
>> 10050  1
>> 10070  5
>> 10080  7
>> 10102  7
>> 10103  4
>> ...
>>
>> I require to sort the output<key,value> pair by value (In ascending
>> order).
>> Please let me know how can i go ahead.
>>
>> Required output:
>> 10050  1
>> 10103  4
>> 10070  5
>> 10080  7
>> 10102  7
>>
>> Thanks in advance,
>> --Shashi
>>
>>
>>
>>
>>
>

Re: How to sort key,value pair by value(In ascending)

Posted by Sudharsan Sampath <su...@gmail.com>.

One way is to reverse the  <key,value> output in the mapper to emit<1,
10050> and in the reducer, use a treeset to order ur values.. for each value
o/p <value, key> in the reducer.

With this O/P will be sorted as per ur needs within each reducer. If u need
a total sorted o/p, u can use a single reducer or design ur partition logic
accordingly.

Thanks
Sudhan S

On Wed, Sep 14, 2011 at 6:14 AM, ksgupta misc <ks...@gmail.com>wrote:

> Hi,
>
> I have the content like
> *10103*,1042279,*4*
> *10070*,1001089,*5*
> *10102*,1015504,*7*
> *10080*,1024369,*7*
> *10050*,1025671,*1*
> ...
> from which i separated the key,value pairs and got the output after a
> single map and reduce as follows:
>
> 10050  1
> 10070  5
> 10080  7
> 10102  7
> 10103  4
> ...
>
> I require to sort the output<key,value> pair by value (In ascending order).
> Please let me know how can i go ahead.
>
> Required output:
> 10050  1
> 10103  4
> 10070  5
> 10080  7
> 10102  7
>
> Thanks in advance,
> --Shashi
>
>
>
>
>