You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Gokhan Capan <gk...@gmail.com> on 2012/09/06 17:04:42 UTC

SGD Based Recommender Contribution Proposal

Dear Mahout community,

I would like to introduce a set of tools for recommender systems those are
implemented as a part of my MSc. thesis. This is inspired by our
conversations in the user-list, and I tried to stick it to existing Taste
framework for possible contribution to Mahout.

The library is available at
github.com/gcapan/recommender<http://github.com/gcapan>.


The library contains Stochastic Gradient Descent based learning algorithms
for Matrix Factorization based recommendation.

Core features of the library are listed below:

1-  It handles different recommendation targets (feedback), namely;
    - Standard numerical recommendation with OLS Regression
    - Binary recommendation with Logistic Regression
    - Multinomial recommendation with Softmax Regression
    - Ordinal recommendation with Proportional Odds Model
    - Predicting counts with Poisson Regression (still experimental)

2- It may use side information from users and items if available

3- It may leverage the dynamic side information (this is what I called it),
which means the features whose values are determined at feedback time (e.g.
day of week for possible effect on people's choices, proximity for location
aware recommendation, etc.)

4- It is an online learning algorithm thus scalable. However, currently the
model is stored in memory. I plan to extend it to store the model in HBase,
too.


The recommenders implement the Mahout's Recommender interface. For
experiments, I have implemented a GenericIncrementalDataModel (in memory),
and List based PreferenceArrays.

I tried to use Mahout's data structures where available. For example,
factor vectors and side info vectors are in Mahout's vector format.

These algorithms are highly inspired by various influential Recommender
System papers, especially from Yehuda Koren. For example, the Ordinal model
is from Koren's OrdRec paper, except the cuts are not user-specific but
global.

I tried the numerical recommender on MovieLens-1M dataset, and it achieved
around 0.851 RMSE with 150 factors and 30 iterations.

The code is tested, but not fully documented.

With some effort, the code can be integrated into Mahout. If it has a
potential to be beneficial for Mahout users, I will be happy to contribute
it to ASF with your guidance.

Any feedback is appreciated.

Regards

-- 
Gokhan

Re: SGD Based Recommender Contribution Proposal

Posted by Gokhan Capan <gk...@gmail.com>.
Hi,

I've submitted the patch to
https://issues.apache.org/jira/browse/MAHOUT-1069

Regards

On Sun, Sep 9, 2012 at 10:27 PM, Ted Dunning <te...@gmail.com> wrote:

> Great.
>
> If the update has a huge impact on existing code, can you break it into
> manageable pieces?
>
> If it is just an addition, having a big blob of stuff is probably fine.
>
> On Sun, Sep 9, 2012 at 7:01 AM, Gokhan Capan <gk...@gmail.com> wrote:
>
> > On Fri, Sep 7, 2012 at 12:48 AM, Ted Dunning <te...@gmail.com>
> > wrote:
> >
> > > This sounds pretty exciting.  Beyond that, it is hard to say much.
> > >
> > > Can you say a bit more about how you would see introducing the code
> into
> > > Mahout?
> > >
> >
> > Ted, I've forked apache/mahout at github, and I will merge the library
> into
> > mahout. I believe in a week I will be able to add documentation and
> mahout
> > jobs for experiments and start submitting patches to JIRA.
> >
>



-- 
Gokhan

Re: SGD Based Recommender Contribution Proposal

Posted by Ted Dunning <te...@gmail.com>.
This actually sounds pretty good, at least in advance of actually looking
at the code.

On Fri, Oct 12, 2012 at 2:15 AM, Sean Owen <sr...@gmail.com> wrote:

> (Bug fixes are good, if you can separate those out, those should not
> be controversial.)
>
> I am still skeptical of new blobs of code, as these have been a big
> problem for the project. If it's clean, follows existing code and
> APIs, and you are willing to actively support it, it may be
> un-harmful.
>
> I am not going to be able to work on Mahout in any significant way any
> more, so would not feel right committing this myself, especially if
> the expectation was that I would be able to help modify and support
> these changes. But, leave it to other committers to take that up. It
> sounds lik eyou have made a lot of effort to weave it in nicely.
>
>
> > Now it is integrable to Mahout, and can work with Mahout's existing
> > Recommender interface. It does not modify any existing code, except a
> couple
> > of additional lines in driver.class.props, which define a few commandline
> > utilities I find useful while experimenting a recommender.
>

Re: SGD Based Recommender Contribution Proposal

Posted by Sean Owen <sr...@gmail.com>.
(Bug fixes are good, if you can separate those out, those should not
be controversial.)

I am still skeptical of new blobs of code, as these have been a big
problem for the project. If it's clean, follows existing code and
APIs, and you are willing to actively support it, it may be
un-harmful.

I am not going to be able to work on Mahout in any significant way any
more, so would not feel right committing this myself, especially if
the expectation was that I would be able to help modify and support
these changes. But, leave it to other committers to take that up. It
sounds lik eyou have made a lot of effort to weave it in nicely.


> Now it is integrable to Mahout, and can work with Mahout's existing
> Recommender interface. It does not modify any existing code, except a couple
> of additional lines in driver.class.props, which define a few commandline
> utilities I find useful while experimenting a recommender.

Re: SGD Based Recommender Contribution Proposal

Posted by Gokhan Capan <gk...@gmail.com>.
On Sun, Sep 9, 2012 at 10:27 PM, Ted Dunning <te...@gmail.com> wrote:

> Great.
>
> If the update has a huge impact on existing code, can you break it into
> manageable pieces?
>
> If it is just an addition, having a big blob of stuff is probably fine.


Now it is integrable to Mahout, and can work with Mahout's existing
Recommender interface. It does not modify any existing code, except a
couple of additional lines in driver.class.props, which define a few
commandline utilities I find useful while experimenting a recommender.

By the way I found a few minor bugs, updated the patch.

Did you have any chance to look at this?


Secondly, I would like to up the thread to trigger a discussion on this.
Sean raised some concerns on the patch. (available in the JIRA page as a
comment)

Quoting Sean's comment:
"I imagine this is all great work. As I commented off-list, it is a big
enough and even different enough beast that it feels like it should be a
separate project. The Mahout code base is already uneven and sprawling and
I think this would exacerbate that – and not generate much "synergy" worth
the effort of integration."

I understand all of these, and want to provide a general response to
possibly clarify some of points Sean made.

Basically it adds an online version of existing Mahout recommendation
capabilities. Learning MF based recommender with Alternating Least Squares
already exists in Mahout, and this is the SGD based version. The different
targets approach is just a set of wrappers on those linear models. (Same as
Generalized Linear Models approach) Adding side info is optional, which may
be beneficial when there is a cold-start issue.

Additionally, the OnlineFactorizationRecommender extends the
AbstractRecommender, and the FactorizationAwareDataModel is a Mahout
DataModel composed with a base DataModel that is capable of adding new
ratings.

Besides all these, I remember the initiative Ted started following Menon
and Elkan's 'Dyadic Prediction Using a Latent Feature Log-Linear Model'
paper. First I intended to improve Ted's initial implementation, then I
started a separate implementation to keep the code integrable to Taste  in
the very beginning. What I mean is, those approaches are really similar.

The code is already integrated, and may be one of the options of many
recommenders to a user. Finally, I am volunteer to keep the code integrated
and working, improve it upon suggestions, and provide a documentation on
usage and details.

Why I don't consider to start a separate project rather than offer to
contribute to Mahout is; I am familiar with Mahout library, the code
already depends on Mahout, and the goal for the project is to be used by
people. Mahout already attracts a plenty of users and developers, which
means the code is used by more people, and with reviews it may be fixed and
improved faster.

Regards

>
> On Sun, Sep 9, 2012 at 7:01 AM, Gokhan Capan <gk...@gmail.com> wrote:
>
> > On Fri, Sep 7, 2012 at 12:48 AM, Ted Dunning <te...@gmail.com>
> > wrote:
> >
> > > This sounds pretty exciting.  Beyond that, it is hard to say much.
> > >
> > > Can you say a bit more about how you would see introducing the code
> into
> > > Mahout?
> > >
> >
> > Ted, I've forked apache/mahout at github, and I will merge the library
> into
> > mahout. I believe in a week I will be able to add documentation and
> mahout
> > jobs for experiments and start submitting patches to JIRA.
> >
>
-- 
Gokhan

Re: SGD Based Recommender Contribution Proposal

Posted by Gokhan Capan <gk...@gmail.com>.
Hi,

I've submitted the patch to
https://issues.apache.org/jira/browse/MAHOUT-1069

Regards

On Sun, Sep 9, 2012 at 10:27 PM, Ted Dunning <te...@gmail.com> wrote:

> Great.
>
> If the update has a huge impact on existing code, can you break it into
> manageable pieces?
>
> If it is just an addition, having a big blob of stuff is probably fine.
>
> On Sun, Sep 9, 2012 at 7:01 AM, Gokhan Capan <gk...@gmail.com> wrote:
>
> > On Fri, Sep 7, 2012 at 12:48 AM, Ted Dunning <te...@gmail.com>
> > wrote:
> >
> > > This sounds pretty exciting.  Beyond that, it is hard to say much.
> > >
> > > Can you say a bit more about how you would see introducing the code
> into
> > > Mahout?
> > >
> >
> > Ted, I've forked apache/mahout at github, and I will merge the library
> into
> > mahout. I believe in a week I will be able to add documentation and
> mahout
> > jobs for experiments and start submitting patches to JIRA.
> >
>



-- 
Gokhan

Re: SGD Based Recommender Contribution Proposal

Posted by Ted Dunning <te...@gmail.com>.
Great.

If the update has a huge impact on existing code, can you break it into
manageable pieces?

If it is just an addition, having a big blob of stuff is probably fine.

On Sun, Sep 9, 2012 at 7:01 AM, Gokhan Capan <gk...@gmail.com> wrote:

> On Fri, Sep 7, 2012 at 12:48 AM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > This sounds pretty exciting.  Beyond that, it is hard to say much.
> >
> > Can you say a bit more about how you would see introducing the code into
> > Mahout?
> >
>
> Ted, I've forked apache/mahout at github, and I will merge the library into
> mahout. I believe in a week I will be able to add documentation and mahout
> jobs for experiments and start submitting patches to JIRA.
>

Re: SGD Based Recommender Contribution Proposal

Posted by Ted Dunning <te...@gmail.com>.
Great.

If the update has a huge impact on existing code, can you break it into
manageable pieces?

If it is just an addition, having a big blob of stuff is probably fine.

On Sun, Sep 9, 2012 at 7:01 AM, Gokhan Capan <gk...@gmail.com> wrote:

> On Fri, Sep 7, 2012 at 12:48 AM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > This sounds pretty exciting.  Beyond that, it is hard to say much.
> >
> > Can you say a bit more about how you would see introducing the code into
> > Mahout?
> >
>
> Ted, I've forked apache/mahout at github, and I will merge the library into
> mahout. I believe in a week I will be able to add documentation and mahout
> jobs for experiments and start submitting patches to JIRA.
>

Re: SGD Based Recommender Contribution Proposal

Posted by Gokhan Capan <gk...@gmail.com>.
On Fri, Sep 7, 2012 at 12:48 AM, Ted Dunning <te...@gmail.com> wrote:

> This sounds pretty exciting.  Beyond that, it is hard to say much.
>
> Can you say a bit more about how you would see introducing the code into
> Mahout?
>

Ted, I've forked apache/mahout at github, and I will merge the library into
mahout. I believe in a week I will be able to add documentation and mahout
jobs for experiments and start submitting patches to JIRA.


> On Thu, Sep 6, 2012 at 9:14 AM, Gokhan Capan <gk...@gmail.com> wrote:
>
> > By the way, I want to mention that my thesis is advised by Ozgur
> Yilmazel,
> > who is a founding member of the Mahout project. I conducted this study
> and
> > kept the implementation integrable to Mahout with his guidance.
> >
> > On Thu, Sep 6, 2012 at 6:04 PM, Gokhan Capan <gk...@gmail.com> wrote:
> >
> > > Dear Mahout community,
> > >
> > > I would like to introduce a set of tools for recommender systems those
> > are
> > > implemented as a part of my MSc. thesis. This is inspired by our
> > > conversations in the user-list, and I tried to stick it to existing
> Taste
> > > framework for possible contribution to Mahout.
> > >
> > > The library is available at github.com/gcapan/recommender<
> > http://github.com/gcapan>.
> > >
> > >
> > > The library contains Stochastic Gradient Descent based learning
> > algorithms
> > > for Matrix Factorization based recommendation.
> > >
> > > Core features of the library are listed below:
> > >
> > > 1-  It handles different recommendation targets (feedback), namely;
> > >     - Standard numerical recommendation with OLS Regression
> > >     - Binary recommendation with Logistic Regression
> > >     - Multinomial recommendation with Softmax Regression
> > >     - Ordinal recommendation with Proportional Odds Model
> > >     - Predicting counts with Poisson Regression (still experimental)
> > >
> > > 2- It may use side information from users and items if available
> > >
> > > 3- It may leverage the dynamic side information (this is what I called
> > > it), which means the features whose values are determined at feedback
> > time
> > > (e.g. day of week for possible effect on people's choices, proximity
> for
> > > location aware recommendation, etc.)
> > >
> > > 4- It is an online learning algorithm thus scalable. However, currently
> > > the model is stored in memory. I plan to extend it to store the model
> in
> > > HBase, too.
> > >
> > >
> > > The recommenders implement the Mahout's Recommender interface. For
> > > experiments, I have implemented a GenericIncrementalDataModel (in
> > memory),
> > > and List based PreferenceArrays.
> > >
> > > I tried to use Mahout's data structures where available. For example,
> > > factor vectors and side info vectors are in Mahout's vector format.
> > >
> > > These algorithms are highly inspired by various influential Recommender
> > > System papers, especially from Yehuda Koren. For example, the Ordinal
> > model
> > > is from Koren's OrdRec paper, except the cuts are not user-specific but
> > > global.
> > >
> > > I tried the numerical recommender on MovieLens-1M dataset, and it
> > achieved
> > > around 0.851 RMSE with 150 factors and 30 iterations.
> > >
> > > The code is tested, but not fully documented.
> > >
> > > With some effort, the code can be integrated into Mahout. If it has a
> > > potential to be beneficial for Mahout users, I will be happy to
> > contribute
> > > it to ASF with your guidance.
> > >
> > > Any feedback is appreciated.
> > >
> > > Regards
> > >
> > > --
> > > Gokhan
> >
> >
> >
> >
> > --
> > Gokhan
> >
>



-- 
Gokhan

Re: SGD Based Recommender Contribution Proposal

Posted by Gokhan Capan <gk...@gmail.com>.
On Fri, Sep 7, 2012 at 12:48 AM, Ted Dunning <te...@gmail.com> wrote:

> This sounds pretty exciting.  Beyond that, it is hard to say much.
>
> Can you say a bit more about how you would see introducing the code into
> Mahout?
>

Ted, I've forked apache/mahout at github, and I will merge the library into
mahout. I believe in a week I will be able to add documentation and mahout
jobs for experiments and start submitting patches to JIRA.


> On Thu, Sep 6, 2012 at 9:14 AM, Gokhan Capan <gk...@gmail.com> wrote:
>
> > By the way, I want to mention that my thesis is advised by Ozgur
> Yilmazel,
> > who is a founding member of the Mahout project. I conducted this study
> and
> > kept the implementation integrable to Mahout with his guidance.
> >
> > On Thu, Sep 6, 2012 at 6:04 PM, Gokhan Capan <gk...@gmail.com> wrote:
> >
> > > Dear Mahout community,
> > >
> > > I would like to introduce a set of tools for recommender systems those
> > are
> > > implemented as a part of my MSc. thesis. This is inspired by our
> > > conversations in the user-list, and I tried to stick it to existing
> Taste
> > > framework for possible contribution to Mahout.
> > >
> > > The library is available at github.com/gcapan/recommender<
> > http://github.com/gcapan>.
> > >
> > >
> > > The library contains Stochastic Gradient Descent based learning
> > algorithms
> > > for Matrix Factorization based recommendation.
> > >
> > > Core features of the library are listed below:
> > >
> > > 1-  It handles different recommendation targets (feedback), namely;
> > >     - Standard numerical recommendation with OLS Regression
> > >     - Binary recommendation with Logistic Regression
> > >     - Multinomial recommendation with Softmax Regression
> > >     - Ordinal recommendation with Proportional Odds Model
> > >     - Predicting counts with Poisson Regression (still experimental)
> > >
> > > 2- It may use side information from users and items if available
> > >
> > > 3- It may leverage the dynamic side information (this is what I called
> > > it), which means the features whose values are determined at feedback
> > time
> > > (e.g. day of week for possible effect on people's choices, proximity
> for
> > > location aware recommendation, etc.)
> > >
> > > 4- It is an online learning algorithm thus scalable. However, currently
> > > the model is stored in memory. I plan to extend it to store the model
> in
> > > HBase, too.
> > >
> > >
> > > The recommenders implement the Mahout's Recommender interface. For
> > > experiments, I have implemented a GenericIncrementalDataModel (in
> > memory),
> > > and List based PreferenceArrays.
> > >
> > > I tried to use Mahout's data structures where available. For example,
> > > factor vectors and side info vectors are in Mahout's vector format.
> > >
> > > These algorithms are highly inspired by various influential Recommender
> > > System papers, especially from Yehuda Koren. For example, the Ordinal
> > model
> > > is from Koren's OrdRec paper, except the cuts are not user-specific but
> > > global.
> > >
> > > I tried the numerical recommender on MovieLens-1M dataset, and it
> > achieved
> > > around 0.851 RMSE with 150 factors and 30 iterations.
> > >
> > > The code is tested, but not fully documented.
> > >
> > > With some effort, the code can be integrated into Mahout. If it has a
> > > potential to be beneficial for Mahout users, I will be happy to
> > contribute
> > > it to ASF with your guidance.
> > >
> > > Any feedback is appreciated.
> > >
> > > Regards
> > >
> > > --
> > > Gokhan
> >
> >
> >
> >
> > --
> > Gokhan
> >
>



-- 
Gokhan

Re: SGD Based Recommender Contribution Proposal

Posted by Ted Dunning <te...@gmail.com>.
This sounds pretty exciting.  Beyond that, it is hard to say much.

Can you say a bit more about how you would see introducing the code into
Mahout?

On Thu, Sep 6, 2012 at 9:14 AM, Gokhan Capan <gk...@gmail.com> wrote:

> By the way, I want to mention that my thesis is advised by Ozgur Yilmazel,
> who is a founding member of the Mahout project. I conducted this study and
> kept the implementation integrable to Mahout with his guidance.
>
> On Thu, Sep 6, 2012 at 6:04 PM, Gokhan Capan <gk...@gmail.com> wrote:
>
> > Dear Mahout community,
> >
> > I would like to introduce a set of tools for recommender systems those
> are
> > implemented as a part of my MSc. thesis. This is inspired by our
> > conversations in the user-list, and I tried to stick it to existing Taste
> > framework for possible contribution to Mahout.
> >
> > The library is available at github.com/gcapan/recommender<
> http://github.com/gcapan>.
> >
> >
> > The library contains Stochastic Gradient Descent based learning
> algorithms
> > for Matrix Factorization based recommendation.
> >
> > Core features of the library are listed below:
> >
> > 1-  It handles different recommendation targets (feedback), namely;
> >     - Standard numerical recommendation with OLS Regression
> >     - Binary recommendation with Logistic Regression
> >     - Multinomial recommendation with Softmax Regression
> >     - Ordinal recommendation with Proportional Odds Model
> >     - Predicting counts with Poisson Regression (still experimental)
> >
> > 2- It may use side information from users and items if available
> >
> > 3- It may leverage the dynamic side information (this is what I called
> > it), which means the features whose values are determined at feedback
> time
> > (e.g. day of week for possible effect on people's choices, proximity for
> > location aware recommendation, etc.)
> >
> > 4- It is an online learning algorithm thus scalable. However, currently
> > the model is stored in memory. I plan to extend it to store the model in
> > HBase, too.
> >
> >
> > The recommenders implement the Mahout's Recommender interface. For
> > experiments, I have implemented a GenericIncrementalDataModel (in
> memory),
> > and List based PreferenceArrays.
> >
> > I tried to use Mahout's data structures where available. For example,
> > factor vectors and side info vectors are in Mahout's vector format.
> >
> > These algorithms are highly inspired by various influential Recommender
> > System papers, especially from Yehuda Koren. For example, the Ordinal
> model
> > is from Koren's OrdRec paper, except the cuts are not user-specific but
> > global.
> >
> > I tried the numerical recommender on MovieLens-1M dataset, and it
> achieved
> > around 0.851 RMSE with 150 factors and 30 iterations.
> >
> > The code is tested, but not fully documented.
> >
> > With some effort, the code can be integrated into Mahout. If it has a
> > potential to be beneficial for Mahout users, I will be happy to
> contribute
> > it to ASF with your guidance.
> >
> > Any feedback is appreciated.
> >
> > Regards
> >
> > --
> > Gokhan
>
>
>
>
> --
> Gokhan
>

Re: SGD Based Recommender Contribution Proposal

Posted by Ted Dunning <te...@gmail.com>.
This sounds pretty exciting.  Beyond that, it is hard to say much.

Can you say a bit more about how you would see introducing the code into
Mahout?

On Thu, Sep 6, 2012 at 9:14 AM, Gokhan Capan <gk...@gmail.com> wrote:

> By the way, I want to mention that my thesis is advised by Ozgur Yilmazel,
> who is a founding member of the Mahout project. I conducted this study and
> kept the implementation integrable to Mahout with his guidance.
>
> On Thu, Sep 6, 2012 at 6:04 PM, Gokhan Capan <gk...@gmail.com> wrote:
>
> > Dear Mahout community,
> >
> > I would like to introduce a set of tools for recommender systems those
> are
> > implemented as a part of my MSc. thesis. This is inspired by our
> > conversations in the user-list, and I tried to stick it to existing Taste
> > framework for possible contribution to Mahout.
> >
> > The library is available at github.com/gcapan/recommender<
> http://github.com/gcapan>.
> >
> >
> > The library contains Stochastic Gradient Descent based learning
> algorithms
> > for Matrix Factorization based recommendation.
> >
> > Core features of the library are listed below:
> >
> > 1-  It handles different recommendation targets (feedback), namely;
> >     - Standard numerical recommendation with OLS Regression
> >     - Binary recommendation with Logistic Regression
> >     - Multinomial recommendation with Softmax Regression
> >     - Ordinal recommendation with Proportional Odds Model
> >     - Predicting counts with Poisson Regression (still experimental)
> >
> > 2- It may use side information from users and items if available
> >
> > 3- It may leverage the dynamic side information (this is what I called
> > it), which means the features whose values are determined at feedback
> time
> > (e.g. day of week for possible effect on people's choices, proximity for
> > location aware recommendation, etc.)
> >
> > 4- It is an online learning algorithm thus scalable. However, currently
> > the model is stored in memory. I plan to extend it to store the model in
> > HBase, too.
> >
> >
> > The recommenders implement the Mahout's Recommender interface. For
> > experiments, I have implemented a GenericIncrementalDataModel (in
> memory),
> > and List based PreferenceArrays.
> >
> > I tried to use Mahout's data structures where available. For example,
> > factor vectors and side info vectors are in Mahout's vector format.
> >
> > These algorithms are highly inspired by various influential Recommender
> > System papers, especially from Yehuda Koren. For example, the Ordinal
> model
> > is from Koren's OrdRec paper, except the cuts are not user-specific but
> > global.
> >
> > I tried the numerical recommender on MovieLens-1M dataset, and it
> achieved
> > around 0.851 RMSE with 150 factors and 30 iterations.
> >
> > The code is tested, but not fully documented.
> >
> > With some effort, the code can be integrated into Mahout. If it has a
> > potential to be beneficial for Mahout users, I will be happy to
> contribute
> > it to ASF with your guidance.
> >
> > Any feedback is appreciated.
> >
> > Regards
> >
> > --
> > Gokhan
>
>
>
>
> --
> Gokhan
>

Re: SGD Based Recommender Contribution Proposal

Posted by Gokhan Capan <gk...@gmail.com>.
By the way, I want to mention that my thesis is advised by Ozgur Yilmazel,
who is a founding member of the Mahout project. I conducted this study and
kept the implementation integrable to Mahout with his guidance.

On Thu, Sep 6, 2012 at 6:04 PM, Gokhan Capan <gk...@gmail.com> wrote:

> Dear Mahout community,
>
> I would like to introduce a set of tools for recommender systems those are
> implemented as a part of my MSc. thesis. This is inspired by our
> conversations in the user-list, and I tried to stick it to existing Taste
> framework for possible contribution to Mahout.
>
> The library is available at github.com/gcapan/recommender<http://github.com/gcapan>.
>
>
> The library contains Stochastic Gradient Descent based learning algorithms
> for Matrix Factorization based recommendation.
>
> Core features of the library are listed below:
>
> 1-  It handles different recommendation targets (feedback), namely;
>     - Standard numerical recommendation with OLS Regression
>     - Binary recommendation with Logistic Regression
>     - Multinomial recommendation with Softmax Regression
>     - Ordinal recommendation with Proportional Odds Model
>     - Predicting counts with Poisson Regression (still experimental)
>
> 2- It may use side information from users and items if available
>
> 3- It may leverage the dynamic side information (this is what I called
> it), which means the features whose values are determined at feedback time
> (e.g. day of week for possible effect on people's choices, proximity for
> location aware recommendation, etc.)
>
> 4- It is an online learning algorithm thus scalable. However, currently
> the model is stored in memory. I plan to extend it to store the model in
> HBase, too.
>
>
> The recommenders implement the Mahout's Recommender interface. For
> experiments, I have implemented a GenericIncrementalDataModel (in memory),
> and List based PreferenceArrays.
>
> I tried to use Mahout's data structures where available. For example,
> factor vectors and side info vectors are in Mahout's vector format.
>
> These algorithms are highly inspired by various influential Recommender
> System papers, especially from Yehuda Koren. For example, the Ordinal model
> is from Koren's OrdRec paper, except the cuts are not user-specific but
> global.
>
> I tried the numerical recommender on MovieLens-1M dataset, and it achieved
> around 0.851 RMSE with 150 factors and 30 iterations.
>
> The code is tested, but not fully documented.
>
> With some effort, the code can be integrated into Mahout. If it has a
> potential to be beneficial for Mahout users, I will be happy to contribute
> it to ASF with your guidance.
>
> Any feedback is appreciated.
>
> Regards
>
> --
> Gokhan




-- 
Gokhan

Re: SGD Based Recommender Contribution Proposal

Posted by Gokhan Capan <gk...@gmail.com>.
By the way, I want to mention that my thesis is advised by Ozgur Yilmazel,
who is a founding member of the Mahout project. I conducted this study and
kept the implementation integrable to Mahout with his guidance.

On Thu, Sep 6, 2012 at 6:04 PM, Gokhan Capan <gk...@gmail.com> wrote:

> Dear Mahout community,
>
> I would like to introduce a set of tools for recommender systems those are
> implemented as a part of my MSc. thesis. This is inspired by our
> conversations in the user-list, and I tried to stick it to existing Taste
> framework for possible contribution to Mahout.
>
> The library is available at github.com/gcapan/recommender<http://github.com/gcapan>.
>
>
> The library contains Stochastic Gradient Descent based learning algorithms
> for Matrix Factorization based recommendation.
>
> Core features of the library are listed below:
>
> 1-  It handles different recommendation targets (feedback), namely;
>     - Standard numerical recommendation with OLS Regression
>     - Binary recommendation with Logistic Regression
>     - Multinomial recommendation with Softmax Regression
>     - Ordinal recommendation with Proportional Odds Model
>     - Predicting counts with Poisson Regression (still experimental)
>
> 2- It may use side information from users and items if available
>
> 3- It may leverage the dynamic side information (this is what I called
> it), which means the features whose values are determined at feedback time
> (e.g. day of week for possible effect on people's choices, proximity for
> location aware recommendation, etc.)
>
> 4- It is an online learning algorithm thus scalable. However, currently
> the model is stored in memory. I plan to extend it to store the model in
> HBase, too.
>
>
> The recommenders implement the Mahout's Recommender interface. For
> experiments, I have implemented a GenericIncrementalDataModel (in memory),
> and List based PreferenceArrays.
>
> I tried to use Mahout's data structures where available. For example,
> factor vectors and side info vectors are in Mahout's vector format.
>
> These algorithms are highly inspired by various influential Recommender
> System papers, especially from Yehuda Koren. For example, the Ordinal model
> is from Koren's OrdRec paper, except the cuts are not user-specific but
> global.
>
> I tried the numerical recommender on MovieLens-1M dataset, and it achieved
> around 0.851 RMSE with 150 factors and 30 iterations.
>
> The code is tested, but not fully documented.
>
> With some effort, the code can be integrated into Mahout. If it has a
> potential to be beneficial for Mahout users, I will be happy to contribute
> it to ASF with your guidance.
>
> Any feedback is appreciated.
>
> Regards
>
> --
> Gokhan




-- 
Gokhan