You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by jamal sasha <ja...@gmail.com> on 2012/10/22 04:40:04 UTC

matrix multiplication

Hi,
  I am trying to do matrix multiplication using pig.

Basically I have data in the form:
data1.txt
item1,item2,0.3
item1, item3, 0.4
item1, item5, 0.6

And then I another data in the form
data2.txt
user1,item1
user1,item2
user1,item5
...
user2,item2
etc

Just to give some context.. I am trying to build a top n recommendation
system.. which is as follows.
Matrix formed by data2.txt
          item1   item2    item3    item4   item5
user1   1           1           0          0          1


Matrix formed by data1.txt

            item1       item2        item 3      item4      item5
item1       1            0.3           0.4             0           0.6
item2                       1
item3                                     1
item4                                                      1
item5                                                                   1


So recommendations for user1 would be whether user1 is the score
computation as followed
Score for user 1 for item 1 = (ignore item1, item1 score) u12* item_12 +
u13*item_13 + u14*item14 + u15*item15

                                       =
        1 *0.3        +    0*0.4   +  0*0   + 1 * 0.6 = 0.9

And then i find this score for user1 and item2

And then for user 2 .. item 1 and so on.

I understand this is more of an implementation challenge.. and not sure
whether this is the right place to ask this.. But any suggestions will be
greatly appreciated.
Thanks
Jamal

Re: matrix multiplication

Posted by Gunther Hagleitner <gh...@hortonworks.com>.
Search for 'nested foreach' statements in the link I sent. You can use
ORDER BY and LIMIT within these statements and I think that's what you're
looking for.

Thanks,
Gunther.

On Mon, Oct 22, 2012 at 10:20 AM, jamal sasha <ja...@gmail.com> wrote:

> Hi
> Thanks for reply .
> But how do I sort this for each user group instead of the entire list by
> score.
> And then for each user group I want to have top 20 rather than selecting
> top 20 from the whole list
> Any ideas :(
> Thanks
>
> On Monday, October 22, 2012, Gunther Hagleitner <
> ghagleitner@hortonworks.com>
> wrote:
> > That's fairly straightforward. Take a look at:
> > http://pig.apache.org/docs/r0.10.0/basic.html (order by, limit).
> >
> > Thanks,
> > Gunther.
> >
> > On Mon, Oct 22, 2012 at 7:12 AM, jamal sasha <ja...@gmail.com>
> wrote:
> >
> >> Hi
> >>    Great . Thanks alot.
> >> How do I sort the result by score and select top 20 (say)?
> >>
> >> On Monday, October 22, 2012, Gunther Hagleitner <
> >> ghagleitner@hortonworks.com>
> >> wrote:
> >> > This should work:
> >> >
> >> > matrix = load 'data1.txt' using PigStorage(',') as (row:chararray,
> >> > column:chararray, value:float);
> >> > vectors = load 'data2.txt' using PigStorage(',') as (user:chararray,
> >> > column:chararray);
> >> >
> >> > joined = join vectors by column, matrix by column;
> >> > groups = group joined by (user, row);
> >> > result = foreach groups generate group.user, group.row, (float)
> >> > SUM(joined.value);
> >> >
> >> > store result into 'result';
> >> >
> >> > Thanks,
> >> > Gunther.
> >> >
> >> > On Sun, Oct 21, 2012 at 7:40 PM, jamal sasha <ja...@gmail.com>
> >> wrote:
> >> >
> >> >> Hi,
> >> >>   I am trying to do matrix multiplication using pig.
> >> >>
> >> >> Basically I have data in the form:
> >> >> data1.txt
> >> >> item1,item2,0.3
> >> >> item1, item3, 0.4
> >> >> item1, item5, 0.6
> >> >>
> >> >> And then I another data in the form
> >> >> data2.txt
> >> >> user1,item1
> >> >> user1,item2
> >> >> user1,item5
> >> >> ...
> >> >> user2,item2
> >> >> etc
> >> >>
> >> >> Just to give some context.. I am trying to build a top n
> recommendation
> >> >> system.. which is as follows.
> >> >> Matrix formed by data2.txt
> >> >>           item1   item2    item3    item4   item5
> >> >> user1   1           1           0          0          1
> >> >>
> >> >>
> >> >> Matrix formed by data1.txt
> >> >>
> >> >>             item1       item2        item 3      item4      item5
> >> >> item1       1            0.3           0.4             0
> 0.6
> >> >> item2                       1
> >> >> item3                                     1
> >> >> item4                                                      1
> >> >> item5
> >> 1
> >> >>
> >> >>
> >> >> So recommendations for user1 would be whether user1 is the score
> >> >> computation as followed
> >> >> Score for user 1 for item 1 = (ignore item1, item1 score) u12*
> item_12 +
> >> >> u13*item_13 + u14*item14 + u15*item15
> >> >>
> >> >>                                        =
> >> >>         1 *0.3        +    0*0.4   +  0*0   + 1 * 0.6 = 0.9
> >> >>
> >> >> And then i find this score for user1 and item2
> >> >>
> >> >> And then for user 2 .. item 1 and so on.
> >> >>
> >> >> I understand this is more of an implementation challenge.. and not
> sure
> >> >> whether this is the right place to ask this.. But any suggestions
> will
> >> be
> >> >> greatly appreciated.
> >> >> Thanks
> >> >> Jamal
> >> >>
> >> >
> >>
> >
>

Re: matrix multiplication

Posted by jamal sasha <ja...@gmail.com>.
Hi
Thanks for reply .
But how do I sort this for each user group instead of the entire list by
score.
And then for each user group I want to have top 20 rather than selecting
top 20 from the whole list
Any ideas :(
Thanks

On Monday, October 22, 2012, Gunther Hagleitner <gh...@hortonworks.com>
wrote:
> That's fairly straightforward. Take a look at:
> http://pig.apache.org/docs/r0.10.0/basic.html (order by, limit).
>
> Thanks,
> Gunther.
>
> On Mon, Oct 22, 2012 at 7:12 AM, jamal sasha <ja...@gmail.com>
wrote:
>
>> Hi
>>    Great . Thanks alot.
>> How do I sort the result by score and select top 20 (say)?
>>
>> On Monday, October 22, 2012, Gunther Hagleitner <
>> ghagleitner@hortonworks.com>
>> wrote:
>> > This should work:
>> >
>> > matrix = load 'data1.txt' using PigStorage(',') as (row:chararray,
>> > column:chararray, value:float);
>> > vectors = load 'data2.txt' using PigStorage(',') as (user:chararray,
>> > column:chararray);
>> >
>> > joined = join vectors by column, matrix by column;
>> > groups = group joined by (user, row);
>> > result = foreach groups generate group.user, group.row, (float)
>> > SUM(joined.value);
>> >
>> > store result into 'result';
>> >
>> > Thanks,
>> > Gunther.
>> >
>> > On Sun, Oct 21, 2012 at 7:40 PM, jamal sasha <ja...@gmail.com>
>> wrote:
>> >
>> >> Hi,
>> >>   I am trying to do matrix multiplication using pig.
>> >>
>> >> Basically I have data in the form:
>> >> data1.txt
>> >> item1,item2,0.3
>> >> item1, item3, 0.4
>> >> item1, item5, 0.6
>> >>
>> >> And then I another data in the form
>> >> data2.txt
>> >> user1,item1
>> >> user1,item2
>> >> user1,item5
>> >> ...
>> >> user2,item2
>> >> etc
>> >>
>> >> Just to give some context.. I am trying to build a top n
recommendation
>> >> system.. which is as follows.
>> >> Matrix formed by data2.txt
>> >>           item1   item2    item3    item4   item5
>> >> user1   1           1           0          0          1
>> >>
>> >>
>> >> Matrix formed by data1.txt
>> >>
>> >>             item1       item2        item 3      item4      item5
>> >> item1       1            0.3           0.4             0           0.6
>> >> item2                       1
>> >> item3                                     1
>> >> item4                                                      1
>> >> item5
>> 1
>> >>
>> >>
>> >> So recommendations for user1 would be whether user1 is the score
>> >> computation as followed
>> >> Score for user 1 for item 1 = (ignore item1, item1 score) u12*
item_12 +
>> >> u13*item_13 + u14*item14 + u15*item15
>> >>
>> >>                                        =
>> >>         1 *0.3        +    0*0.4   +  0*0   + 1 * 0.6 = 0.9
>> >>
>> >> And then i find this score for user1 and item2
>> >>
>> >> And then for user 2 .. item 1 and so on.
>> >>
>> >> I understand this is more of an implementation challenge.. and not
sure
>> >> whether this is the right place to ask this.. But any suggestions will
>> be
>> >> greatly appreciated.
>> >> Thanks
>> >> Jamal
>> >>
>> >
>>
>

Re: matrix multiplication

Posted by Gunther Hagleitner <gh...@hortonworks.com>.
That's fairly straightforward. Take a look at:
http://pig.apache.org/docs/r0.10.0/basic.html (order by, limit).

Thanks,
Gunther.

On Mon, Oct 22, 2012 at 7:12 AM, jamal sasha <ja...@gmail.com> wrote:

> Hi
>    Great . Thanks alot.
> How do I sort the result by score and select top 20 (say)?
>
> On Monday, October 22, 2012, Gunther Hagleitner <
> ghagleitner@hortonworks.com>
> wrote:
> > This should work:
> >
> > matrix = load 'data1.txt' using PigStorage(',') as (row:chararray,
> > column:chararray, value:float);
> > vectors = load 'data2.txt' using PigStorage(',') as (user:chararray,
> > column:chararray);
> >
> > joined = join vectors by column, matrix by column;
> > groups = group joined by (user, row);
> > result = foreach groups generate group.user, group.row, (float)
> > SUM(joined.value);
> >
> > store result into 'result';
> >
> > Thanks,
> > Gunther.
> >
> > On Sun, Oct 21, 2012 at 7:40 PM, jamal sasha <ja...@gmail.com>
> wrote:
> >
> >> Hi,
> >>   I am trying to do matrix multiplication using pig.
> >>
> >> Basically I have data in the form:
> >> data1.txt
> >> item1,item2,0.3
> >> item1, item3, 0.4
> >> item1, item5, 0.6
> >>
> >> And then I another data in the form
> >> data2.txt
> >> user1,item1
> >> user1,item2
> >> user1,item5
> >> ...
> >> user2,item2
> >> etc
> >>
> >> Just to give some context.. I am trying to build a top n recommendation
> >> system.. which is as follows.
> >> Matrix formed by data2.txt
> >>           item1   item2    item3    item4   item5
> >> user1   1           1           0          0          1
> >>
> >>
> >> Matrix formed by data1.txt
> >>
> >>             item1       item2        item 3      item4      item5
> >> item1       1            0.3           0.4             0           0.6
> >> item2                       1
> >> item3                                     1
> >> item4                                                      1
> >> item5
> 1
> >>
> >>
> >> So recommendations for user1 would be whether user1 is the score
> >> computation as followed
> >> Score for user 1 for item 1 = (ignore item1, item1 score) u12* item_12 +
> >> u13*item_13 + u14*item14 + u15*item15
> >>
> >>                                        =
> >>         1 *0.3        +    0*0.4   +  0*0   + 1 * 0.6 = 0.9
> >>
> >> And then i find this score for user1 and item2
> >>
> >> And then for user 2 .. item 1 and so on.
> >>
> >> I understand this is more of an implementation challenge.. and not sure
> >> whether this is the right place to ask this.. But any suggestions will
> be
> >> greatly appreciated.
> >> Thanks
> >> Jamal
> >>
> >
>

Re: matrix multiplication

Posted by jamal sasha <ja...@gmail.com>.
Hi
   Great . Thanks alot.
How do I sort the result by score and select top 20 (say)?

On Monday, October 22, 2012, Gunther Hagleitner <gh...@hortonworks.com>
wrote:
> This should work:
>
> matrix = load 'data1.txt' using PigStorage(',') as (row:chararray,
> column:chararray, value:float);
> vectors = load 'data2.txt' using PigStorage(',') as (user:chararray,
> column:chararray);
>
> joined = join vectors by column, matrix by column;
> groups = group joined by (user, row);
> result = foreach groups generate group.user, group.row, (float)
> SUM(joined.value);
>
> store result into 'result';
>
> Thanks,
> Gunther.
>
> On Sun, Oct 21, 2012 at 7:40 PM, jamal sasha <ja...@gmail.com>
wrote:
>
>> Hi,
>>   I am trying to do matrix multiplication using pig.
>>
>> Basically I have data in the form:
>> data1.txt
>> item1,item2,0.3
>> item1, item3, 0.4
>> item1, item5, 0.6
>>
>> And then I another data in the form
>> data2.txt
>> user1,item1
>> user1,item2
>> user1,item5
>> ...
>> user2,item2
>> etc
>>
>> Just to give some context.. I am trying to build a top n recommendation
>> system.. which is as follows.
>> Matrix formed by data2.txt
>>           item1   item2    item3    item4   item5
>> user1   1           1           0          0          1
>>
>>
>> Matrix formed by data1.txt
>>
>>             item1       item2        item 3      item4      item5
>> item1       1            0.3           0.4             0           0.6
>> item2                       1
>> item3                                     1
>> item4                                                      1
>> item5                                                                   1
>>
>>
>> So recommendations for user1 would be whether user1 is the score
>> computation as followed
>> Score for user 1 for item 1 = (ignore item1, item1 score) u12* item_12 +
>> u13*item_13 + u14*item14 + u15*item15
>>
>>                                        =
>>         1 *0.3        +    0*0.4   +  0*0   + 1 * 0.6 = 0.9
>>
>> And then i find this score for user1 and item2
>>
>> And then for user 2 .. item 1 and so on.
>>
>> I understand this is more of an implementation challenge.. and not sure
>> whether this is the right place to ask this.. But any suggestions will be
>> greatly appreciated.
>> Thanks
>> Jamal
>>
>

Re: matrix multiplication

Posted by Gunther Hagleitner <gh...@hortonworks.com>.
This should work:

matrix = load 'data1.txt' using PigStorage(',') as (row:chararray,
column:chararray, value:float);
vectors = load 'data2.txt' using PigStorage(',') as (user:chararray,
column:chararray);

joined = join vectors by column, matrix by column;
groups = group joined by (user, row);
result = foreach groups generate group.user, group.row, (float)
SUM(joined.value);

store result into 'result';

Thanks,
Gunther.

On Sun, Oct 21, 2012 at 7:40 PM, jamal sasha <ja...@gmail.com> wrote:

> Hi,
>   I am trying to do matrix multiplication using pig.
>
> Basically I have data in the form:
> data1.txt
> item1,item2,0.3
> item1, item3, 0.4
> item1, item5, 0.6
>
> And then I another data in the form
> data2.txt
> user1,item1
> user1,item2
> user1,item5
> ...
> user2,item2
> etc
>
> Just to give some context.. I am trying to build a top n recommendation
> system.. which is as follows.
> Matrix formed by data2.txt
>           item1   item2    item3    item4   item5
> user1   1           1           0          0          1
>
>
> Matrix formed by data1.txt
>
>             item1       item2        item 3      item4      item5
> item1       1            0.3           0.4             0           0.6
> item2                       1
> item3                                     1
> item4                                                      1
> item5                                                                   1
>
>
> So recommendations for user1 would be whether user1 is the score
> computation as followed
> Score for user 1 for item 1 = (ignore item1, item1 score) u12* item_12 +
> u13*item_13 + u14*item14 + u15*item15
>
>                                        =
>         1 *0.3        +    0*0.4   +  0*0   + 1 * 0.6 = 0.9
>
> And then i find this score for user1 and item2
>
> And then for user 2 .. item 1 and so on.
>
> I understand this is more of an implementation challenge.. and not sure
> whether this is the right place to ask this.. But any suggestions will be
> greatly appreciated.
> Thanks
> Jamal
>