You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by "Sebastian Schelter (JIRA)" <ji...@apache.org> on 2010/12/21 00:27:03 UTC

[jira] Updated: (MAHOUT-542) MapReduce implementation of ALS-WR

     [ https://issues.apache.org/jira/browse/MAHOUT-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebastian Schelter updated MAHOUT-542:
--------------------------------------

    Attachment: MAHOUT-542-2.patch

An updated version of the patch. I fixed a small bug, added more tests and polished the code a little.

The distributed matrix factorization works fine now on a toy example. The next steps will be to use real data and do some holdout tests.

> MapReduce implementation of ALS-WR
> ----------------------------------
>
>                 Key: MAHOUT-542
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-542
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>    Affects Versions: 0.5
>            Reporter: Sebastian Schelter
>         Attachments: MAHOUT-452.patch, MAHOUT-542-2.patch
>
>
> As Mahout is currently lacking a distributed collaborative filtering algorithm that uses matrix factorization, I spent some time reading through a couple of the Netflix papers and stumbled upon the "Large-scale Parallel Collaborative Filtering for the Netﬂix Prize" available at http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf.
> It describes a parallel algorithm that uses "Alternating-Least-Squares with Weighted-λ-Regularization" to factorize the preference-matrix and gives some insights on how the authors distributed the computation using Matlab.
> It seemed to me that this approach could also easily be parallelized using Map/Reduce, so I sat down and created a prototype version. I'm not really sure I got the mathematical details correct (they need some optimization anyway), but I wanna put up my prototype implementation here per Yonik's law of patches.
> Maybe someone has the time and motivation to work a little on this with me. It would be great if someone could validate the approach taken (I'm willing to help as the code might not be intuitive to read) and could try to factorize some test data and give feedback then.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [jira] Updated: (MAHOUT-542) MapReduce implementation of ALS-WR

Posted by Ted Dunning <te...@gmail.com>.

With a line search like this, the evolutionary algorithm should be
reasonably friendly in terms of evaluations but the function being optimized
is almost certainly pretty friendly for a line search.

The only caveat is that the evolutionary algorithm will do better in the
presence of noise.  The line search can get fooled once and never recover.

On Tue, Dec 21, 2010 at 9:00 AM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> There's evolutionary algorithm in SGD to find those in adaptive way using
> cross-validation but it may be too demanding in terms of # of experiments.
> just FYI
>
> On Mon, Dec 20, 2010 at 11:58 PM, Sebastian Schelter <
> ssc.open@googlemail.com> wrote:
>
> > Hi Dmitriy,
> >
> > the paper states that it's easy to find a good lambda value with 3-4
> > experiments. I still have to verify that assumption on a real dataset.
> >
> > --sebastian
> >
> >
> > On 21.12.2010 00:57, Dmitriy Lyubimov wrote:
> >
> >>  HI Sebastian,
> >>
> >> how do you come up with a good Lambda to use with this weighted ALS?
> >>
> >> On Mon, Dec 20, 2010 at 3:27 PM, Sebastian Schelter (JIRA)
> >> <ji...@apache.org>wrote:
> >>
> >>       [
> >>>
> >>>
> https://issues.apache.org/jira/browse/MAHOUT-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> >>> ]
> >>>
> >>> Sebastian Schelter updated MAHOUT-542:
> >>> --------------------------------------
> >>>
> >>>     Attachment: MAHOUT-542-2.patch
> >>>
> >>> An updated version of the patch. I fixed a small bug, added more tests
> >>> and
> >>> polished the code a little.
> >>>
> >>> The distributed matrix factorization works fine now on a toy example.
> The
> >>> next steps will be to use real data and do some holdout tests.
> >>>
> >>> MapReduce implementation of ALS-WR
> >>>> ----------------------------------
> >>>>
> >>>>                 Key: MAHOUT-542
> >>>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-542
> >>>>             Project: Mahout
> >>>>          Issue Type: New Feature
> >>>>          Components: Collaborative Filtering
> >>>>    Affects Versions: 0.5
> >>>>            Reporter: Sebastian Schelter
> >>>>         Attachments: MAHOUT-452.patch, MAHOUT-542-2.patch
> >>>>
> >>>>
> >>>> As Mahout is currently lacking a distributed collaborative filtering
> >>>>
> >>> algorithm that uses matrix factorization, I spent some time reading
> >>> through
> >>> a couple of the Netflix papers and stumbled upon the "Large-scale
> >>> Parallel
> >>> Collaborative Filtering for the Netﬂix Prize" available at
> >>>
> >>>
> http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf
> >>> <
> >>>
> http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08%28submitted%29.pdf
> >
> >>>
> >>>
> >>> .
> >>>
> >>>> It describes a parallel algorithm that uses "Alternating-Least-Squares
> >>>>
> >>> with Weighted-λ-Regularization" to factorize the preference-matrix and
> >>> gives
> >>> some insights on how the authors distributed the computation using
> >>> Matlab.
> >>>
> >>>> It seemed to me that this approach could also easily be parallelized
> >>>>
> >>> using Map/Reduce, so I sat down and created a prototype version. I'm
> not
> >>> really sure I got the mathematical details correct (they need some
> >>> optimization anyway), but I wanna put up my prototype implementation
> here
> >>> per Yonik's law of patches.
> >>>
> >>>> Maybe someone has the time and motivation to work a little on this
> with
> >>>>
> >>> me. It would be great if someone could validate the approach taken (I'm
> >>> willing to help as the code might not be intuitive to read) and could
> try
> >>> to
> >>> factorize some test data and give feedback then.
> >>>
> >>> --
> >>> This message is automatically generated by JIRA.
> >>> -
> >>> You can reply to this email to add a comment to the issue online.
> >>>
> >>>
> >>>
> >
>

Re: [jira] Updated: (MAHOUT-542) MapReduce implementation of ALS-WR

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

There's evolutionary algorithm in SGD to find those in adaptive way using
cross-validation but it may be too demanding in terms of # of experiments.
just FYI

On Mon, Dec 20, 2010 at 11:58 PM, Sebastian Schelter <
ssc.open@googlemail.com> wrote:

> Hi Dmitriy,
>
> the paper states that it's easy to find a good lambda value with 3-4
> experiments. I still have to verify that assumption on a real dataset.
>
> --sebastian
>
>
> On 21.12.2010 00:57, Dmitriy Lyubimov wrote:
>
>>  HI Sebastian,
>>
>> how do you come up with a good Lambda to use with this weighted ALS?
>>
>> On Mon, Dec 20, 2010 at 3:27 PM, Sebastian Schelter (JIRA)
>> <ji...@apache.org>wrote:
>>
>>       [
>>>
>>> https://issues.apache.org/jira/browse/MAHOUT-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>>> ]
>>>
>>> Sebastian Schelter updated MAHOUT-542:
>>> --------------------------------------
>>>
>>>     Attachment: MAHOUT-542-2.patch
>>>
>>> An updated version of the patch. I fixed a small bug, added more tests
>>> and
>>> polished the code a little.
>>>
>>> The distributed matrix factorization works fine now on a toy example. The
>>> next steps will be to use real data and do some holdout tests.
>>>
>>> MapReduce implementation of ALS-WR
>>>> ----------------------------------
>>>>
>>>>                 Key: MAHOUT-542
>>>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-542
>>>>             Project: Mahout
>>>>          Issue Type: New Feature
>>>>          Components: Collaborative Filtering
>>>>    Affects Versions: 0.5
>>>>            Reporter: Sebastian Schelter
>>>>         Attachments: MAHOUT-452.patch, MAHOUT-542-2.patch
>>>>
>>>>
>>>> As Mahout is currently lacking a distributed collaborative filtering
>>>>
>>> algorithm that uses matrix factorization, I spent some time reading
>>> through
>>> a couple of the Netflix papers and stumbled upon the "Large-scale
>>> Parallel
>>> Collaborative Filtering for the Netﬂix Prize" available at
>>>
>>> http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf
>>> <
>>> http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08%28submitted%29.pdf>
>>>
>>>
>>> .
>>>
>>>> It describes a parallel algorithm that uses "Alternating-Least-Squares
>>>>
>>> with Weighted-λ-Regularization" to factorize the preference-matrix and
>>> gives
>>> some insights on how the authors distributed the computation using
>>> Matlab.
>>>
>>>> It seemed to me that this approach could also easily be parallelized
>>>>
>>> using Map/Reduce, so I sat down and created a prototype version. I'm not
>>> really sure I got the mathematical details correct (they need some
>>> optimization anyway), but I wanna put up my prototype implementation here
>>> per Yonik's law of patches.
>>>
>>>> Maybe someone has the time and motivation to work a little on this with
>>>>
>>> me. It would be great if someone could validate the approach taken (I'm
>>> willing to help as the code might not be intuitive to read) and could try
>>> to
>>> factorize some test data and give feedback then.
>>>
>>> --
>>> This message is automatically generated by JIRA.
>>> -
>>> You can reply to this email to add a comment to the issue online.
>>>
>>>
>>>
>

Re: [jira] Updated: (MAHOUT-542) MapReduce implementation of ALS-WR

Posted by Sebastian Schelter <ss...@googlemail.com>.

Hi Dmitriy,

the paper states that it's easy to find a good lambda value with 3-4 
experiments. I still have to verify that assumption on a real dataset.

--sebastian

On 21.12.2010 00:57, Dmitriy Lyubimov wrote:
> HI Sebastian,
>
> how do you come up with a good Lambda to use with this weighted ALS?
>
> On Mon, Dec 20, 2010 at 3:27 PM, Sebastian Schelter (JIRA)
> <ji...@apache.org>wrote:
>
>>      [
>> https://issues.apache.org/jira/browse/MAHOUT-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>>
>> Sebastian Schelter updated MAHOUT-542:
>> --------------------------------------
>>
>>      Attachment: MAHOUT-542-2.patch
>>
>> An updated version of the patch. I fixed a small bug, added more tests and
>> polished the code a little.
>>
>> The distributed matrix factorization works fine now on a toy example. The
>> next steps will be to use real data and do some holdout tests.
>>
>>> MapReduce implementation of ALS-WR
>>> ----------------------------------
>>>
>>>                  Key: MAHOUT-542
>>>                  URL: https://issues.apache.org/jira/browse/MAHOUT-542
>>>              Project: Mahout
>>>           Issue Type: New Feature
>>>           Components: Collaborative Filtering
>>>     Affects Versions: 0.5
>>>             Reporter: Sebastian Schelter
>>>          Attachments: MAHOUT-452.patch, MAHOUT-542-2.patch
>>>
>>>
>>> As Mahout is currently lacking a distributed collaborative filtering
>> algorithm that uses matrix factorization, I spent some time reading through
>> a couple of the Netflix papers and stumbled upon the "Large-scale Parallel
>> Collaborative Filtering for the Netﬂix Prize" available at
>> http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf<http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08%28submitted%29.pdf>
>> .
>>> It describes a parallel algorithm that uses "Alternating-Least-Squares
>> with Weighted-λ-Regularization" to factorize the preference-matrix and gives
>> some insights on how the authors distributed the computation using Matlab.
>>> It seemed to me that this approach could also easily be parallelized
>> using Map/Reduce, so I sat down and created a prototype version. I'm not
>> really sure I got the mathematical details correct (they need some
>> optimization anyway), but I wanna put up my prototype implementation here
>> per Yonik's law of patches.
>>> Maybe someone has the time and motivation to work a little on this with
>> me. It would be great if someone could validate the approach taken (I'm
>> willing to help as the code might not be intuitive to read) and could try to
>> factorize some test data and give feedback then.
>>
>> --
>> This message is automatically generated by JIRA.
>> -
>> You can reply to this email to add a comment to the issue online.
>>
>>

Re: [jira] Updated: (MAHOUT-542) MapReduce implementation of ALS-WR

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

HI Sebastian,

how do you come up with a good Lambda to use with this weighted ALS?

On Mon, Dec 20, 2010 at 3:27 PM, Sebastian Schelter (JIRA)
<ji...@apache.org>wrote:

>
>     [
> https://issues.apache.org/jira/browse/MAHOUT-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>
> Sebastian Schelter updated MAHOUT-542:
> --------------------------------------
>
>     Attachment: MAHOUT-542-2.patch
>
> An updated version of the patch. I fixed a small bug, added more tests and
> polished the code a little.
>
> The distributed matrix factorization works fine now on a toy example. The
> next steps will be to use real data and do some holdout tests.
>
> > MapReduce implementation of ALS-WR
> > ----------------------------------
> >
> >                 Key: MAHOUT-542
> >                 URL: https://issues.apache.org/jira/browse/MAHOUT-542
> >             Project: Mahout
> >          Issue Type: New Feature
> >          Components: Collaborative Filtering
> >    Affects Versions: 0.5
> >            Reporter: Sebastian Schelter
> >         Attachments: MAHOUT-452.patch, MAHOUT-542-2.patch
> >
> >
> > As Mahout is currently lacking a distributed collaborative filtering
> algorithm that uses matrix factorization, I spent some time reading through
> a couple of the Netflix papers and stumbled upon the "Large-scale Parallel
> Collaborative Filtering for the Netﬂix Prize" available at
> http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf<http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08%28submitted%29.pdf>
> .
> > It describes a parallel algorithm that uses "Alternating-Least-Squares
> with Weighted-λ-Regularization" to factorize the preference-matrix and gives
> some insights on how the authors distributed the computation using Matlab.
> > It seemed to me that this approach could also easily be parallelized
> using Map/Reduce, so I sat down and created a prototype version. I'm not
> really sure I got the mathematical details correct (they need some
> optimization anyway), but I wanna put up my prototype implementation here
> per Yonik's law of patches.
> > Maybe someone has the time and motivation to work a little on this with
> me. It would be great if someone could validate the approach taken (I'm
> willing to help as the code might not be intuitive to read) and could try to
> factorize some test data and give feedback then.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>