You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Saikat Kanjilal <sx...@hotmail.com> on 2013/06/08 01:15:45 UTC

Work on ALS for future releases

Sebastien et al,
Per the latest comments on mahout-974 I'd like to get a deeper understanding on what future  improvements or architectural changes are desired on the ALS subsection of the codebase.  I'd love to help with the development efforts on this codebase and on the clustering subsection as well.

Regards

Sent from my iPhone

Re: Work on ALS for future releases

Posted by Ted Dunning <te...@gmail.com>.
If there is only one parameter to optimize, then line search is an easy
answer.

If there are more than one parameter, then the EvolutionarySearch that we
already have can work or any of the many optimization methods from
commons.math would apply.  ES is better if the parameter space is
complicated.


On Sat, Jun 8, 2013 at 10:41 PM, Sean Owen <sr...@gmail.com> wrote:

> Grid search to me is just trying all combinations of values for different
> parameters, and you try them with a cross-validation set. They aren't
> alternatives. I don't have any knowledge of clustering-related items.
> On Jun 8, 2013 7:42 PM, "Saikat Kanjilal" <sx...@hotmail.com> wrote:
>
> > Sebastian/Sean,
> > Thanks for your responses, first of I am more familiar with cross
> > validation than grid search, regardless I'll  go ahead and jira up tasks
> > for building tools to support both grid search and training error
> checking.
> >  What about the clustering sub-section of the code, is there any
> > cleanup/rearchitecture anticipated there ?
> >
> >
>

Re: Work on ALS for future releases

Posted by Sean Owen <sr...@gmail.com>.
Grid search to me is just trying all combinations of values for different
parameters, and you try them with a cross-validation set. They aren't
alternatives. I don't have any knowledge of clustering-related items.
On Jun 8, 2013 7:42 PM, "Saikat Kanjilal" <sx...@hotmail.com> wrote:

> Sebastian/Sean,
> Thanks for your responses, first of I am more familiar with cross
> validation than grid search, regardless I'll  go ahead and jira up tasks
> for building tools to support both grid search and training error checking.
>  What about the clustering sub-section of the code, is there any
> cleanup/rearchitecture anticipated there ?
>
>

Re: Work on ALS for future releases

Posted by Saikat Kanjilal <sx...@hotmail.com>.
Sebastian/Sean,
Thanks for your responses, first of I am more familiar with cross validation than grid search, regardless I'll  go ahead and jira up tasks for building tools to support both grid search and training error checking.  What about the clustering sub-section of the code, is there any cleanup/rearchitecture anticipated there ?

Sent from my iPhone

On Jun 8, 2013, at 10:19 AM, Sean Owen <sr...@gmail.com> wrote:

> PS you might find it useful to pinch a bit of this code to implement
> convergence checking and grid search... not quite the same code base
> but maps quite directly:
> 
> https://code.google.com/p/myrrix-recommender/source/browse/trunk/online/src/net/myrrix/online/factorizer/als/AlternatingLeastSquares.java#226
> https://code.google.com/p/myrrix-recommender/source/browse/trunk/online/src/net/myrrix/online/eval/ParameterOptimizer.java#111
> 
> On Sat, Jun 8, 2013 at 6:05 PM, Sebastian Schelter <ss...@apache.org> wrote:
>> Hi Saikat,
>> 
>> Great that you want to work on the ALS code. I think it is very
>> important to make it easier to use, ideally no knowledge of the papers
>> and formulas should be necessary.
>> 
>> As you know, the ALS code has a hyperparameter lambda that needs to be
>> tuned in order to get a good factorization. Are you familiar with grid
>> search and cross validation? It would be awesome to add some tooling for
>> them to Mahout that helps users to easily find a good lambda.
>> 
>> Another important new feature would be to check the training error of
>> the factorization during the computation to make it automatically detect
>> convergence. Thereby users would not have to give the number of
>> iterations to execute as parameter.
>> 
>> What do you think, does this sound reasonable to you?
>> 
>> Best,
>> Sebastian
>> 
>> 
>> On 08.06.2013 01:15, Saikat Kanjilal wrote:
>>> Sebastien et al,
>>> Per the latest comments on mahout-974 I'd like to get a deeper understanding on what future  improvements or architectural changes are desired on the ALS subsection of the codebase.  I'd love to help with the development efforts on this codebase and on the clustering subsection as well.
>>> 
>>> Regards
>>> 
>>> Sent from my iPhone
> 

Re: Work on ALS for future releases

Posted by Sean Owen <sr...@gmail.com>.
PS you might find it useful to pinch a bit of this code to implement
convergence checking and grid search... not quite the same code base
but maps quite directly:

https://code.google.com/p/myrrix-recommender/source/browse/trunk/online/src/net/myrrix/online/factorizer/als/AlternatingLeastSquares.java#226
https://code.google.com/p/myrrix-recommender/source/browse/trunk/online/src/net/myrrix/online/eval/ParameterOptimizer.java#111

On Sat, Jun 8, 2013 at 6:05 PM, Sebastian Schelter <ss...@apache.org> wrote:
> Hi Saikat,
>
> Great that you want to work on the ALS code. I think it is very
> important to make it easier to use, ideally no knowledge of the papers
> and formulas should be necessary.
>
> As you know, the ALS code has a hyperparameter lambda that needs to be
> tuned in order to get a good factorization. Are you familiar with grid
> search and cross validation? It would be awesome to add some tooling for
> them to Mahout that helps users to easily find a good lambda.
>
> Another important new feature would be to check the training error of
> the factorization during the computation to make it automatically detect
> convergence. Thereby users would not have to give the number of
> iterations to execute as parameter.
>
> What do you think, does this sound reasonable to you?
>
> Best,
> Sebastian
>
>
> On 08.06.2013 01:15, Saikat Kanjilal wrote:
>> Sebastien et al,
>> Per the latest comments on mahout-974 I'd like to get a deeper understanding on what future  improvements or architectural changes are desired on the ALS subsection of the codebase.  I'd love to help with the development efforts on this codebase and on the clustering subsection as well.
>>
>> Regards
>>
>> Sent from my iPhone
>>
>

Re: Work on ALS for future releases

Posted by Sebastian Schelter <ss...@apache.org>.
Hi Saikat,

Great that you want to work on the ALS code. I think it is very
important to make it easier to use, ideally no knowledge of the papers
and formulas should be necessary.

As you know, the ALS code has a hyperparameter lambda that needs to be
tuned in order to get a good factorization. Are you familiar with grid
search and cross validation? It would be awesome to add some tooling for
them to Mahout that helps users to easily find a good lambda.

Another important new feature would be to check the training error of
the factorization during the computation to make it automatically detect
convergence. Thereby users would not have to give the number of
iterations to execute as parameter.

What do you think, does this sound reasonable to you?

Best,
Sebastian


On 08.06.2013 01:15, Saikat Kanjilal wrote:
> Sebastien et al,
> Per the latest comments on mahout-974 I'd like to get a deeper understanding on what future  improvements or architectural changes are desired on the ALS subsection of the codebase.  I'd love to help with the development efforts on this codebase and on the clustering subsection as well.
> 
> Regards
> 
> Sent from my iPhone
>