You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Lac Trung <tr...@gmail.com> on 2012/04/24 04:38:41 UTC

Determine the key of Map function

Hello everyone !

I have a problem with MapReduce [:(] like that :
I have 4 file input with 3 fields : teacherId, classId, numberOfStudent
(numberOfStudent is ordered by desc for each teach)
Output is top 30 classId that numberOfStudent is max for each teacher.
My approach is MapReduce like Wordcount example. But I don't know how to
determine key for map function.
I run Wordcount example, understand its code but I have no experience at
programming MapReduce.

Can anyone help me to resolve this problem ?
Thanks so much !


-- 
Lạc Trung
20083535

Re: Determine the key of Map function

Posted by Lac Trung <tr...@gmail.com>.
Thanks so much !


Vào 12:21 Ngày 24 tháng 4 năm 2012, Devaraj k <de...@huawei.com> đã
viết:

> Hi Lac,
>
>  As per my understanding based on your problem description, you need to
> the below things.
>
> 1. Mapper : Write a mapper which emits records from input files and
> convert intto key and values. Here this key should contain teacher id,
> class id and no of students, value can be empty(or null).
> 2. Partitioner : Write Custom partitioner to send all the records for a
> teacher id to one reducer.
> 3. Grouping Comaparator : Write a comparator to group the records based on
> teacher id.
> 4. Sorting Comparator : Write a comparator to sort the records based on
> teacher id and no of students.
> 5. Reducer : In the reducer, you will get the records for all teachers one
> after other and also in the sorted order(by no of students) for a teacher
> id. You can keep how many top records you want in the reducer and finally
> can be written.
>
> You can refer this doc for reference:
> http://www.inf.ed.ac.uk/publications/thesis/online/IM100859.pdf
>
> Thanks
> Devaraj
>
> ________________________________________
> From: Lac Trung [trungnb3535@gmail.com]
> Sent: Tuesday, April 24, 2012 10:11 AM
> To: common-user@hadoop.apache.org
> Subject: Re: Determine the key of Map function
>
> Ah, as I said before, I have no experience at programming MapReduce. So,
> can you give me some documents or websites or something about programming
> the thing you said above? ("Thousand things start hard" - VietNam)
> Thanks so much ^^!
>
> Vào 10:54 Ngày 24 tháng 4 năm 2012, Lac Trung <tr...@gmail.com> đã
> viết:
>
> > Thanks Jay so much !
> > I will try this.
> > ^^
> >
> > Vào 10:52 Ngày 24 tháng 4 năm 2012, Jay Vyas <ja...@gmail.com> đã
> > viết:
> >
> > Ahh... Well than the key will be teacher, and the value will simply be
> >>
> >> <-1 * # students, class_id> .
> >>
> >> Then, you will see in the reducer that the first 3 entries will always
> be
> >> the ones you wanted.
> >>
> >> On Mon, Apr 23, 2012 at 10:17 PM, Lac Trung <tr...@gmail.com>
> >> wrote:
> >>
> >> > Hi Jay !
> >> > I think it's a bit difference here. I want to get 30 classId for each
> >> > teacherId that have most students.
> >> > For example : get 3 classId.
> >> > (File1)
> >> > 1) Teacher1, Class11, 30
> >> > 2) Teacher1, Class12, 29
> >> > 3) Teacher1, Class13, 28
> >> > 4) Teacher1, Class14, 27
> >> > ... n ...
> >> >
> >> > n+1) Teacher2, Class21, 45
> >> > n+2) Teacher2, Class22, 44
> >> > n+3) Teacher2, Class23, 43
> >> > n+4) Teacher2, Class24, 42
> >> > ... n+m ...
> >> >
> >> > => return 3 line 1, 2, 3 for Teacher1 and line n+1, n+2, n+3 for
> >> Teacher2
> >> >
> >> >
> >> > Vào 09:52 Ngày 24 tháng 4 năm 2012, Jay Vyas <ja...@gmail.com>
> đã
> >> > viết:
> >> >
> >> > > Its somewhat tricky to understand exactly what you need from your
> >> > > explanation, but I believe you want teachers who have the most
> >> students
> >> > in
> >> > > a given class.  So for English, i have 10 teachers teaching the
> class
> >> -
> >> > and
> >> > > i want the ones with the highes # of students.
> >> > >
> >> > > You can output key= <classid>, value=<-1*#ofstudent,teacherid> as
> the
> >> > > values.
> >> > >
> >> > > The values will then be sorted, by # of students.  You can thus pick
> >> > > teacher in the the first value of your reducer, and that will be the
> >> > > teacher for class id = xyz , with the highes number of students.
> >> > >
> >> > > You can also be smart in your mapper by running a combiner to remove
> >> the
> >> > > teacherids who are clearly not maximal.
> >> > >
> >> > > On Mon, Apr 23, 2012 at 9:38 PM, Lac Trung <tr...@gmail.com>
> >> > wrote:
> >> > >
> >> > > > Hello everyone !
> >> > > >
> >> > > > I have a problem with MapReduce [:(] like that :
> >> > > > I have 4 file input with 3 fields : teacherId, classId,
> >> numberOfStudent
> >> > > > (numberOfStudent is ordered by desc for each teach)
> >> > > > Output is top 30 classId that numberOfStudent is max for each
> >> teacher.
> >> > > > My approach is MapReduce like Wordcount example. But I don't know
> >> how
> >> > to
> >> > > > determine key for map function.
> >> > > > I run Wordcount example, understand its code but I have no
> >> experience
> >> > at
> >> > > > programming MapReduce.
> >> > > >
> >> > > > Can anyone help me to resolve this problem ?
> >> > > > Thanks so much !
> >> > > >
> >> > > >
> >> > > > --
> >> > > > Lạc Trung
> >> > > > 20083535
> >> > > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Jay Vyas
> >> > > MMSB/UCHC
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Lạc Trung
> >> > 20083535
> >> >
> >>
> >>
> >>
> >> --
> >> Jay Vyas
> >> MMSB/UCHC
> >>
> >
> >
> >
> > --
> > Lạc Trung
> > 20083535
> >
> >
>
>
> --
> Lạc Trung
> 20083535
>



-- 
Lạc Trung
20083535

RE: Determine the key of Map function

Posted by Devaraj k <de...@huawei.com>.
Hi Lac,

 As per my understanding based on your problem description, you need to the below things.

1. Mapper : Write a mapper which emits records from input files and convert intto key and values. Here this key should contain teacher id, class id and no of students, value can be empty(or null).
2. Partitioner : Write Custom partitioner to send all the records for a teacher id to one reducer.
3. Grouping Comaparator : Write a comparator to group the records based on teacher id.
4. Sorting Comparator : Write a comparator to sort the records based on teacher id and no of students.
5. Reducer : In the reducer, you will get the records for all teachers one after other and also in the sorted order(by no of students) for a teacher id. You can keep how many top records you want in the reducer and finally can be written.

You can refer this doc for reference:
http://www.inf.ed.ac.uk/publications/thesis/online/IM100859.pdf

Thanks
Devaraj

________________________________________
From: Lac Trung [trungnb3535@gmail.com]
Sent: Tuesday, April 24, 2012 10:11 AM
To: common-user@hadoop.apache.org
Subject: Re: Determine the key of Map function

Ah, as I said before, I have no experience at programming MapReduce. So,
can you give me some documents or websites or something about programming
the thing you said above? ("Thousand things start hard" - VietNam)
Thanks so much ^^!

Vào 10:54 Ngày 24 tháng 4 năm 2012, Lac Trung <tr...@gmail.com> đã
viết:

> Thanks Jay so much !
> I will try this.
> ^^
>
> Vào 10:52 Ngày 24 tháng 4 năm 2012, Jay Vyas <ja...@gmail.com> đã
> viết:
>
> Ahh... Well than the key will be teacher, and the value will simply be
>>
>> <-1 * # students, class_id> .
>>
>> Then, you will see in the reducer that the first 3 entries will always be
>> the ones you wanted.
>>
>> On Mon, Apr 23, 2012 at 10:17 PM, Lac Trung <tr...@gmail.com>
>> wrote:
>>
>> > Hi Jay !
>> > I think it's a bit difference here. I want to get 30 classId for each
>> > teacherId that have most students.
>> > For example : get 3 classId.
>> > (File1)
>> > 1) Teacher1, Class11, 30
>> > 2) Teacher1, Class12, 29
>> > 3) Teacher1, Class13, 28
>> > 4) Teacher1, Class14, 27
>> > ... n ...
>> >
>> > n+1) Teacher2, Class21, 45
>> > n+2) Teacher2, Class22, 44
>> > n+3) Teacher2, Class23, 43
>> > n+4) Teacher2, Class24, 42
>> > ... n+m ...
>> >
>> > => return 3 line 1, 2, 3 for Teacher1 and line n+1, n+2, n+3 for
>> Teacher2
>> >
>> >
>> > Vào 09:52 Ngày 24 tháng 4 năm 2012, Jay Vyas <ja...@gmail.com> đã
>> > viết:
>> >
>> > > Its somewhat tricky to understand exactly what you need from your
>> > > explanation, but I believe you want teachers who have the most
>> students
>> > in
>> > > a given class.  So for English, i have 10 teachers teaching the class
>> -
>> > and
>> > > i want the ones with the highes # of students.
>> > >
>> > > You can output key= <classid>, value=<-1*#ofstudent,teacherid> as the
>> > > values.
>> > >
>> > > The values will then be sorted, by # of students.  You can thus pick
>> > > teacher in the the first value of your reducer, and that will be the
>> > > teacher for class id = xyz , with the highes number of students.
>> > >
>> > > You can also be smart in your mapper by running a combiner to remove
>> the
>> > > teacherids who are clearly not maximal.
>> > >
>> > > On Mon, Apr 23, 2012 at 9:38 PM, Lac Trung <tr...@gmail.com>
>> > wrote:
>> > >
>> > > > Hello everyone !
>> > > >
>> > > > I have a problem with MapReduce [:(] like that :
>> > > > I have 4 file input with 3 fields : teacherId, classId,
>> numberOfStudent
>> > > > (numberOfStudent is ordered by desc for each teach)
>> > > > Output is top 30 classId that numberOfStudent is max for each
>> teacher.
>> > > > My approach is MapReduce like Wordcount example. But I don't know
>> how
>> > to
>> > > > determine key for map function.
>> > > > I run Wordcount example, understand its code but I have no
>> experience
>> > at
>> > > > programming MapReduce.
>> > > >
>> > > > Can anyone help me to resolve this problem ?
>> > > > Thanks so much !
>> > > >
>> > > >
>> > > > --
>> > > > Lạc Trung
>> > > > 20083535
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Jay Vyas
>> > > MMSB/UCHC
>> > >
>> >
>> >
>> >
>> > --
>> > Lạc Trung
>> > 20083535
>> >
>>
>>
>>
>> --
>> Jay Vyas
>> MMSB/UCHC
>>
>
>
>
> --
> Lạc Trung
> 20083535
>
>


--
Lạc Trung
20083535

Re: Determine the key of Map function

Posted by Lac Trung <tr...@gmail.com>.
Ah, as I said before, I have no experience at programming MapReduce. So,
can you give me some documents or websites or something about programming
the thing you said above? ("Thousand things start hard" - VietNam)
Thanks so much ^^!

Vào 10:54 Ngày 24 tháng 4 năm 2012, Lac Trung <tr...@gmail.com> đã
viết:

> Thanks Jay so much !
> I will try this.
> ^^
>
> Vào 10:52 Ngày 24 tháng 4 năm 2012, Jay Vyas <ja...@gmail.com> đã
> viết:
>
> Ahh... Well than the key will be teacher, and the value will simply be
>>
>> <-1 * # students, class_id> .
>>
>> Then, you will see in the reducer that the first 3 entries will always be
>> the ones you wanted.
>>
>> On Mon, Apr 23, 2012 at 10:17 PM, Lac Trung <tr...@gmail.com>
>> wrote:
>>
>> > Hi Jay !
>> > I think it's a bit difference here. I want to get 30 classId for each
>> > teacherId that have most students.
>> > For example : get 3 classId.
>> > (File1)
>> > 1) Teacher1, Class11, 30
>> > 2) Teacher1, Class12, 29
>> > 3) Teacher1, Class13, 28
>> > 4) Teacher1, Class14, 27
>> > ... n ...
>> >
>> > n+1) Teacher2, Class21, 45
>> > n+2) Teacher2, Class22, 44
>> > n+3) Teacher2, Class23, 43
>> > n+4) Teacher2, Class24, 42
>> > ... n+m ...
>> >
>> > => return 3 line 1, 2, 3 for Teacher1 and line n+1, n+2, n+3 for
>> Teacher2
>> >
>> >
>> > Vào 09:52 Ngày 24 tháng 4 năm 2012, Jay Vyas <ja...@gmail.com> đã
>> > viết:
>> >
>> > > Its somewhat tricky to understand exactly what you need from your
>> > > explanation, but I believe you want teachers who have the most
>> students
>> > in
>> > > a given class.  So for English, i have 10 teachers teaching the class
>> -
>> > and
>> > > i want the ones with the highes # of students.
>> > >
>> > > You can output key= <classid>, value=<-1*#ofstudent,teacherid> as the
>> > > values.
>> > >
>> > > The values will then be sorted, by # of students.  You can thus pick
>> > > teacher in the the first value of your reducer, and that will be the
>> > > teacher for class id = xyz , with the highes number of students.
>> > >
>> > > You can also be smart in your mapper by running a combiner to remove
>> the
>> > > teacherids who are clearly not maximal.
>> > >
>> > > On Mon, Apr 23, 2012 at 9:38 PM, Lac Trung <tr...@gmail.com>
>> > wrote:
>> > >
>> > > > Hello everyone !
>> > > >
>> > > > I have a problem with MapReduce [:(] like that :
>> > > > I have 4 file input with 3 fields : teacherId, classId,
>> numberOfStudent
>> > > > (numberOfStudent is ordered by desc for each teach)
>> > > > Output is top 30 classId that numberOfStudent is max for each
>> teacher.
>> > > > My approach is MapReduce like Wordcount example. But I don't know
>> how
>> > to
>> > > > determine key for map function.
>> > > > I run Wordcount example, understand its code but I have no
>> experience
>> > at
>> > > > programming MapReduce.
>> > > >
>> > > > Can anyone help me to resolve this problem ?
>> > > > Thanks so much !
>> > > >
>> > > >
>> > > > --
>> > > > Lạc Trung
>> > > > 20083535
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Jay Vyas
>> > > MMSB/UCHC
>> > >
>> >
>> >
>> >
>> > --
>> > Lạc Trung
>> > 20083535
>> >
>>
>>
>>
>> --
>> Jay Vyas
>> MMSB/UCHC
>>
>
>
>
> --
> Lạc Trung
> 20083535
>
>


-- 
Lạc Trung
20083535

Re: Determine the key of Map function

Posted by Lac Trung <tr...@gmail.com>.
Ah, as I said before, I have no experience at programming MapReduce. So,
can you give me some documents or websites or something about the thing you
said above ? ("Thousand things start hard" - VietNam)
Thanks so much ^^!

Vào 10:54 Ngày 24 tháng 4 năm 2012, Lac Trung <tr...@gmail.com> đã
viết:

> Thanks Jay so much !
> I will try this.
> ^^
>
> Vào 10:52 Ngày 24 tháng 4 năm 2012, Jay Vyas <ja...@gmail.com> đã
> viết:
>
> Ahh... Well than the key will be teacher, and the value will simply be
>>
>> <-1 * # students, class_id> .
>>
>> Then, you will see in the reducer that the first 3 entries will always be
>> the ones you wanted.
>>
>> On Mon, Apr 23, 2012 at 10:17 PM, Lac Trung <tr...@gmail.com>
>> wrote:
>>
>> > Hi Jay !
>> > I think it's a bit difference here. I want to get 30 classId for each
>> > teacherId that have most students.
>> > For example : get 3 classId.
>> > (File1)
>> > 1) Teacher1, Class11, 30
>> > 2) Teacher1, Class12, 29
>> > 3) Teacher1, Class13, 28
>> > 4) Teacher1, Class14, 27
>> > ... n ...
>> >
>> > n+1) Teacher2, Class21, 45
>> > n+2) Teacher2, Class22, 44
>> > n+3) Teacher2, Class23, 43
>> > n+4) Teacher2, Class24, 42
>> > ... n+m ...
>> >
>> > => return 3 line 1, 2, 3 for Teacher1 and line n+1, n+2, n+3 for
>> Teacher2
>> >
>> >
>> > Vào 09:52 Ngày 24 tháng 4 năm 2012, Jay Vyas <ja...@gmail.com> đã
>> > viết:
>> >
>> > > Its somewhat tricky to understand exactly what you need from your
>> > > explanation, but I believe you want teachers who have the most
>> students
>> > in
>> > > a given class.  So for English, i have 10 teachers teaching the class
>> -
>> > and
>> > > i want the ones with the highes # of students.
>> > >
>> > > You can output key= <classid>, value=<-1*#ofstudent,teacherid> as the
>> > > values.
>> > >
>> > > The values will then be sorted, by # of students.  You can thus pick
>> > > teacher in the the first value of your reducer, and that will be the
>> > > teacher for class id = xyz , with the highes number of students.
>> > >
>> > > You can also be smart in your mapper by running a combiner to remove
>> the
>> > > teacherids who are clearly not maximal.
>> > >
>> > > On Mon, Apr 23, 2012 at 9:38 PM, Lac Trung <tr...@gmail.com>
>> > wrote:
>> > >
>> > > > Hello everyone !
>> > > >
>> > > > I have a problem with MapReduce [:(] like that :
>> > > > I have 4 file input with 3 fields : teacherId, classId,
>> numberOfStudent
>> > > > (numberOfStudent is ordered by desc for each teach)
>> > > > Output is top 30 classId that numberOfStudent is max for each
>> teacher.
>> > > > My approach is MapReduce like Wordcount example. But I don't know
>> how
>> > to
>> > > > determine key for map function.
>> > > > I run Wordcount example, understand its code but I have no
>> experience
>> > at
>> > > > programming MapReduce.
>> > > >
>> > > > Can anyone help me to resolve this problem ?
>> > > > Thanks so much !
>> > > >
>> > > >
>> > > > --
>> > > > Lạc Trung
>> > > > 20083535
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Jay Vyas
>> > > MMSB/UCHC
>> > >
>> >
>> >
>> >
>> > --
>> > Lạc Trung
>> > 20083535
>> >
>>
>>
>>
>> --
>> Jay Vyas
>> MMSB/UCHC
>>
>
>
>
> --
> Lạc Trung
> 20083535
>
>


-- 
Lạc Trung
20083535

Re: Determine the key of Map function

Posted by Lac Trung <tr...@gmail.com>.
Thanks Jay so much !
I will try this.
^^

Vào 10:52 Ngày 24 tháng 4 năm 2012, Jay Vyas <ja...@gmail.com> đã viết:

> Ahh... Well than the key will be teacher, and the value will simply be
>
> <-1 * # students, class_id> .
>
> Then, you will see in the reducer that the first 3 entries will always be
> the ones you wanted.
>
> On Mon, Apr 23, 2012 at 10:17 PM, Lac Trung <tr...@gmail.com> wrote:
>
> > Hi Jay !
> > I think it's a bit difference here. I want to get 30 classId for each
> > teacherId that have most students.
> > For example : get 3 classId.
> > (File1)
> > 1) Teacher1, Class11, 30
> > 2) Teacher1, Class12, 29
> > 3) Teacher1, Class13, 28
> > 4) Teacher1, Class14, 27
> > ... n ...
> >
> > n+1) Teacher2, Class21, 45
> > n+2) Teacher2, Class22, 44
> > n+3) Teacher2, Class23, 43
> > n+4) Teacher2, Class24, 42
> > ... n+m ...
> >
> > => return 3 line 1, 2, 3 for Teacher1 and line n+1, n+2, n+3 for Teacher2
> >
> >
> > Vào 09:52 Ngày 24 tháng 4 năm 2012, Jay Vyas <ja...@gmail.com> đã
> > viết:
> >
> > > Its somewhat tricky to understand exactly what you need from your
> > > explanation, but I believe you want teachers who have the most students
> > in
> > > a given class.  So for English, i have 10 teachers teaching the class -
> > and
> > > i want the ones with the highes # of students.
> > >
> > > You can output key= <classid>, value=<-1*#ofstudent,teacherid> as the
> > > values.
> > >
> > > The values will then be sorted, by # of students.  You can thus pick
> > > teacher in the the first value of your reducer, and that will be the
> > > teacher for class id = xyz , with the highes number of students.
> > >
> > > You can also be smart in your mapper by running a combiner to remove
> the
> > > teacherids who are clearly not maximal.
> > >
> > > On Mon, Apr 23, 2012 at 9:38 PM, Lac Trung <tr...@gmail.com>
> > wrote:
> > >
> > > > Hello everyone !
> > > >
> > > > I have a problem with MapReduce [:(] like that :
> > > > I have 4 file input with 3 fields : teacherId, classId,
> numberOfStudent
> > > > (numberOfStudent is ordered by desc for each teach)
> > > > Output is top 30 classId that numberOfStudent is max for each
> teacher.
> > > > My approach is MapReduce like Wordcount example. But I don't know how
> > to
> > > > determine key for map function.
> > > > I run Wordcount example, understand its code but I have no experience
> > at
> > > > programming MapReduce.
> > > >
> > > > Can anyone help me to resolve this problem ?
> > > > Thanks so much !
> > > >
> > > >
> > > > --
> > > > Lạc Trung
> > > > 20083535
> > > >
> > >
> > >
> > >
> > > --
> > > Jay Vyas
> > > MMSB/UCHC
> > >
> >
> >
> >
> > --
> > Lạc Trung
> > 20083535
> >
>
>
>
> --
> Jay Vyas
> MMSB/UCHC
>



-- 
Lạc Trung
20083535

Re: Determine the key of Map function

Posted by Jay Vyas <ja...@gmail.com>.
Ahh... Well than the key will be teacher, and the value will simply be

<-1 * # students, class_id> .

Then, you will see in the reducer that the first 3 entries will always be
the ones you wanted.

On Mon, Apr 23, 2012 at 10:17 PM, Lac Trung <tr...@gmail.com> wrote:

> Hi Jay !
> I think it's a bit difference here. I want to get 30 classId for each
> teacherId that have most students.
> For example : get 3 classId.
> (File1)
> 1) Teacher1, Class11, 30
> 2) Teacher1, Class12, 29
> 3) Teacher1, Class13, 28
> 4) Teacher1, Class14, 27
> ... n ...
>
> n+1) Teacher2, Class21, 45
> n+2) Teacher2, Class22, 44
> n+3) Teacher2, Class23, 43
> n+4) Teacher2, Class24, 42
> ... n+m ...
>
> => return 3 line 1, 2, 3 for Teacher1 and line n+1, n+2, n+3 for Teacher2
>
>
> Vào 09:52 Ngày 24 tháng 4 năm 2012, Jay Vyas <ja...@gmail.com> đã
> viết:
>
> > Its somewhat tricky to understand exactly what you need from your
> > explanation, but I believe you want teachers who have the most students
> in
> > a given class.  So for English, i have 10 teachers teaching the class -
> and
> > i want the ones with the highes # of students.
> >
> > You can output key= <classid>, value=<-1*#ofstudent,teacherid> as the
> > values.
> >
> > The values will then be sorted, by # of students.  You can thus pick
> > teacher in the the first value of your reducer, and that will be the
> > teacher for class id = xyz , with the highes number of students.
> >
> > You can also be smart in your mapper by running a combiner to remove the
> > teacherids who are clearly not maximal.
> >
> > On Mon, Apr 23, 2012 at 9:38 PM, Lac Trung <tr...@gmail.com>
> wrote:
> >
> > > Hello everyone !
> > >
> > > I have a problem with MapReduce [:(] like that :
> > > I have 4 file input with 3 fields : teacherId, classId, numberOfStudent
> > > (numberOfStudent is ordered by desc for each teach)
> > > Output is top 30 classId that numberOfStudent is max for each teacher.
> > > My approach is MapReduce like Wordcount example. But I don't know how
> to
> > > determine key for map function.
> > > I run Wordcount example, understand its code but I have no experience
> at
> > > programming MapReduce.
> > >
> > > Can anyone help me to resolve this problem ?
> > > Thanks so much !
> > >
> > >
> > > --
> > > Lạc Trung
> > > 20083535
> > >
> >
> >
> >
> > --
> > Jay Vyas
> > MMSB/UCHC
> >
>
>
>
> --
> Lạc Trung
> 20083535
>



-- 
Jay Vyas
MMSB/UCHC

Re: Determine the key of Map function

Posted by Lac Trung <tr...@gmail.com>.
Hi Jay !
I think it's a bit difference here. I want to get 30 classId for each
teacherId that have most students.
For example : get 3 classId.
(File1)
1) Teacher1, Class11, 30
2) Teacher1, Class12, 29
3) Teacher1, Class13, 28
4) Teacher1, Class14, 27
... n ...

n+1) Teacher2, Class21, 45
n+2) Teacher2, Class22, 44
n+3) Teacher2, Class23, 43
n+4) Teacher2, Class24, 42
... n+m ...

=> return 3 line 1, 2, 3 for Teacher1 and line n+1, n+2, n+3 for Teacher2


Vào 09:52 Ngày 24 tháng 4 năm 2012, Jay Vyas <ja...@gmail.com> đã viết:

> Its somewhat tricky to understand exactly what you need from your
> explanation, but I believe you want teachers who have the most students in
> a given class.  So for English, i have 10 teachers teaching the class - and
> i want the ones with the highes # of students.
>
> You can output key= <classid>, value=<-1*#ofstudent,teacherid> as the
> values.
>
> The values will then be sorted, by # of students.  You can thus pick
> teacher in the the first value of your reducer, and that will be the
> teacher for class id = xyz , with the highes number of students.
>
> You can also be smart in your mapper by running a combiner to remove the
> teacherids who are clearly not maximal.
>
> On Mon, Apr 23, 2012 at 9:38 PM, Lac Trung <tr...@gmail.com> wrote:
>
> > Hello everyone !
> >
> > I have a problem with MapReduce [:(] like that :
> > I have 4 file input with 3 fields : teacherId, classId, numberOfStudent
> > (numberOfStudent is ordered by desc for each teach)
> > Output is top 30 classId that numberOfStudent is max for each teacher.
> > My approach is MapReduce like Wordcount example. But I don't know how to
> > determine key for map function.
> > I run Wordcount example, understand its code but I have no experience at
> > programming MapReduce.
> >
> > Can anyone help me to resolve this problem ?
> > Thanks so much !
> >
> >
> > --
> > Lạc Trung
> > 20083535
> >
>
>
>
> --
> Jay Vyas
> MMSB/UCHC
>



-- 
Lạc Trung
20083535

Re: Determine the key of Map function

Posted by Jay Vyas <ja...@gmail.com>.
Its somewhat tricky to understand exactly what you need from your
explanation, but I believe you want teachers who have the most students in
a given class.  So for English, i have 10 teachers teaching the class - and
i want the ones with the highes # of students.

You can output key= <classid>, value=<-1*#ofstudent,teacherid> as the
values.

The values will then be sorted, by # of students.  You can thus pick
teacher in the the first value of your reducer, and that will be the
teacher for class id = xyz , with the highes number of students.

You can also be smart in your mapper by running a combiner to remove the
teacherids who are clearly not maximal.

On Mon, Apr 23, 2012 at 9:38 PM, Lac Trung <tr...@gmail.com> wrote:

> Hello everyone !
>
> I have a problem with MapReduce [:(] like that :
> I have 4 file input with 3 fields : teacherId, classId, numberOfStudent
> (numberOfStudent is ordered by desc for each teach)
> Output is top 30 classId that numberOfStudent is max for each teacher.
> My approach is MapReduce like Wordcount example. But I don't know how to
> determine key for map function.
> I run Wordcount example, understand its code but I have no experience at
> programming MapReduce.
>
> Can anyone help me to resolve this problem ?
> Thanks so much !
>
>
> --
> Lạc Trung
> 20083535
>



-- 
Jay Vyas
MMSB/UCHC