You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2012/09/18 14:23:07 UTC
[jira] [Commented] (MAHOUT-1069) Multi-target, side-info aware,
SGD-based recommender algorithms, examples, and tools to run
[ https://issues.apache.org/jira/browse/MAHOUT-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457769#comment-13457769 ]
Sean Owen commented on MAHOUT-1069:
-----------------------------------
I imagine this is all great work. As I commented off-list, it is a big enough and even different enough beast that it feels like it should be a separate project. The Mahout code base is already uneven and sprawling and I think this would exacerbate that -- and not generate much "synergy" worth the effort of integration.
> Multi-target, side-info aware, SGD-based recommender algorithms, examples, and tools to run
> -------------------------------------------------------------------------------------------
>
> Key: MAHOUT-1069
> URL: https://issues.apache.org/jira/browse/MAHOUT-1069
> Project: Mahout
> Issue Type: Improvement
> Components: CLI, Collaborative Filtering
> Affects Versions: 0.8
> Reporter: Gokhan Capan
> Assignee: Sean Owen
> Labels: cf, improvement, sgd
> Attachments: MAHOUT-1069.patch
>
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> Upon our conversations on dev-list, I would like to state that I have completed the merge of the recommender algorithms that is mentioned in http://goo.gl/fh4d9 to mahout.
> These are a set of learning algorithms for matrix factorization based recommendation, which are capable of:
> * Recommending multiple targets:
> *# Numerical Recommendation with OLS Regression
> *# Binary Recommendation with Logistic Regression
> *# Multinomial Recommendation with Softmax Regression
> *# Ordinal Recommendation with Proportional Odds Model
> * Leveraging side info in mahout vector format where available
> *# User side information
> *# Item side information
> *# Dynamic side information (side info at feedback moment, such as proximity, day of week etc.)
> * Online learning
> Some command-line tools are provided as mahout jobs, for pre-experiment utilities and running experiments.
> Evaluation tools for numerical and categorical recommenders are added.
> A simple example for Movielens-1M data is provided, and it achieved pretty good results (0.851 RMSE in a randomly generated test data after some validation to determine learning and regularization rates on a separate validation data)
> There is no modification in the existing Mahout code, except the added lines in driver.class.props for command-line tools. However, that became a huge patch with dozens of new source files.
> These algorithms are highly inspired from various influential Recommender System papers, especially Yehuda Koren's. For example, the Ordinal model is from Koren's OrdRec paper, except the cuts are not user-specific but global.
> Left for future:
> # The core algorithms are tested, but there probably exists some parts those tests do not cover. I saw many of those in action without problem, but I am going to add new tests regularly.
> # Not all algorithms have been tried on appropriate datasets, and they may need some improvement. However, I use the algorithms also for my M.Sc. thesis, which means I will eventually submit more experiments. As the experimenting infrastructure exists, I believe community may provide more experiments, too.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira