You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by Nkechi Nnadi <nk...@gmail.com> on 2013/04/01 19:45:41 UTC

Re: factorization machines as new project

Hello,

I'm long time lurker.  I would be interested in implementing these.  I
thought I would get my feet wet with contributing to wiki with tutorials
since I have used Mahout for recommendation and clustering in my
dissertation.  I have never contributed code before and I would love to
start now.

-Nkechi


On Sun, Mar 31, 2013 at 1:14 PM, Robin Anil <ro...@gmail.com> wrote:

> FMs work really well for a whole range of things. Having implemented them
> myself, I can extend my services as a reviewer if anyone is willing to
> start on it.
>
> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>
>
> On Sun, Mar 31, 2013 at 2:18 AM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > Relative to Dan's recent mention of SOM as possible new project, here are
> > slides from KDD Cup 2012 in which Stephen Rendle describes how he did
> using
> > a very straightforward implementation of Factorization Machines [1,2].
> >
> >
> > FMs are interesting in the context of Mahout because they can be used in
> a
> > wide variety of settings including recommendation and targeting and
> because
> > they have very good performance on a number of tasks.
> >
> > I should mention that Robin was the one who first mentioned FMs to me.
> >
> > The KDD 2012 competition [3] is of interest in any case because it
> provides
> > a large amount of realistic data for commercially important problems.
> >
> > [1]
> >
> >
> https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/RendleSlides.pdf
> >
> > [2]
> >
> >
> https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/Rendle.pdf
> >
> > [3] http://www.kddcup2012.org/
> >
>

Re: factorization machines as new project

Posted by Ted Dunning <te...@gmail.com>.

Awesome progress.

Thanks much!


On Sun, Apr 14, 2013 at 1:27 PM, Gokhan Capan <gk...@gmail.com> wrote:

> Ok then, now my roadmap is:
>
> Tomorrow I will re-submit the Lucene Matrix patch with support for
> multiple fields (Probably SRM sub-classed version of multi-matrices after
> testing it).
>
> Multi-vectors is another thing that the community may be interested in
> (maybe to help them to assign a row of multi-matrices), I can submit it
> upon request after asking in dev-list.
>
> This week I will refactor the factorization machine with SGD
> implementation to make it operate on a single matrix as input, and then try
> it on a dataset. Then we can talk on submitting a diff for the algorithm.
> (And possible use cases for the algorithm, e.g. integration with
> Recommender interface)
>
> Then the persistent version of the LuceneMatrix and an InputFormat on top
> of it will come.
>
> Ted, Robin,
> Thank you for all responses, all helped me a lot.
>
>
>
> On Sun, Apr 14, 2013 at 11:05 PM, Ted Dunning <te...@gmail.com>wrote:
>
>>
>> On Sun, Apr 14, 2013 at 11:59 AM, Gokhan Capan <gk...@gmail.com> wrote:
>>
>>> - I strongly suspect that you don't need to implement VectorSuperView.
>>>>  Won't the normal handling of viewRow in AbstractMatrix work here?  Speed
>>>> may be an issue, but all speed questions should be decided by measurements.
>>>>
>>> It was because the iterateNonZero didn't work, and this was intended to
>>> work on mostly sparse matrices. I think (but I'm not sure yet) making this
>>> ConcatenatedMatrix a direct subclass of SparseRowMatrix would solve this
>>> problem, that may be an option. (I personally needed this multi-vectors
>>> anyway, so I implemented it)
>>>
>>
>> This is an interesting option (sub-classing from SRM).
>>
>> Having the multi-vectors is nice as you say.  My only point was that they
>> weren't necessarily implied by the need for row views.  I am not sure which
>> would be faster in the end.
>>
>>
>>
>
>
> --
> Gokhan
>

Re: factorization machines as new project

Posted by Gokhan Capan <gk...@gmail.com>.

Ok then, now my roadmap is:

Tomorrow I will re-submit the Lucene Matrix patch with support for multiple
fields (Probably SRM sub-classed version of multi-matrices after testing
it).

Multi-vectors is another thing that the community may be interested in
(maybe to help them to assign a row of multi-matrices), I can submit it
upon request after asking in dev-list.

This week I will refactor the factorization machine with SGD implementation
to make it operate on a single matrix as input, and then try it on a
dataset. Then we can talk on submitting a diff for the algorithm. (And
possible use cases for the algorithm, e.g. integration with Recommender
interface)

Then the persistent version of the LuceneMatrix and an InputFormat on top
of it will come.

Ted, Robin,
Thank you for all responses, all helped me a lot.

On Sun, Apr 14, 2013 at 11:05 PM, Ted Dunning <te...@gmail.com> wrote:

>
> On Sun, Apr 14, 2013 at 11:59 AM, Gokhan Capan <gk...@gmail.com> wrote:
>
>> - I strongly suspect that you don't need to implement VectorSuperView.
>>>  Won't the normal handling of viewRow in AbstractMatrix work here?  Speed
>>> may be an issue, but all speed questions should be decided by measurements.
>>>
>> It was because the iterateNonZero didn't work, and this was intended to
>> work on mostly sparse matrices. I think (but I'm not sure yet) making this
>> ConcatenatedMatrix a direct subclass of SparseRowMatrix would solve this
>> problem, that may be an option. (I personally needed this multi-vectors
>> anyway, so I implemented it)
>>
>
> This is an interesting option (sub-classing from SRM).
>
> Having the multi-vectors is nice as you say.  My only point was that they
> weren't necessarily implied by the need for row views.  I am not sure which
> would be faster in the end.
>
>
>

-- 
Gokhan

Re: factorization machines as new project

Posted by Ted Dunning <te...@gmail.com>.

On Sun, Apr 14, 2013 at 11:59 AM, Gokhan Capan <gk...@gmail.com> wrote:

> - I strongly suspect that you don't need to implement VectorSuperView.
>>  Won't the normal handling of viewRow in AbstractMatrix work here?  Speed
>> may be an issue, but all speed questions should be decided by measurements.
>>
> It was because the iterateNonZero didn't work, and this was intended to
> work on mostly sparse matrices. I think (but I'm not sure yet) making this
> ConcatenatedMatrix a direct subclass of SparseRowMatrix would solve this
> problem, that may be an option. (I personally needed this multi-vectors
> anyway, so I implemented it)
>

This is an interesting option (sub-classing from SRM).

Having the multi-vectors is nice as you say.  My only point was that they
weren't necessarily implied by the need for row views.  I am not sure which
would be faster in the end.

Re: factorization machines as new project

Posted by Gokhan Capan <gk...@gmail.com>.

Thanks for quick response. My response is inline


On Sun, Apr 14, 2013 at 7:46 PM, Ted Dunning <te...@gmail.com> wrote:

>
> This is a good start.  I think that there are some things that bug me in
> the implementation.
>
> - assignColumn should work the same way that viewColumn does.
>
Done.

>
> - the machinery that finds the component matrix for a particular column
> should be separated out as a private method.
>
Done.

>
> - I think that the ColumnSizeCalculator class should go away.  You don't
> need an extra object there, just a method.
>
Done, (with a private static method).

>
> - I strongly suspect that you don't need to implement VectorSuperView.
>  Won't the normal handling of viewRow in AbstractMatrix work here?  Speed
> may be an issue, but all speed questions should be decided by measurements.
>
It was because the iterateNonZero didn't work, and this was intended to
work on mostly sparse matrices. I think (but I'm not sure yet) making this
ConcatenatedMatrix a direct subclass of SparseRowMatrix would solve this
problem, that may be an option. (I personally needed this multi-vectors
anyway, so I implemented it)

>
> - viewPart and like() are important.
>
I intentionally left those unsupported, because I wasn't sure what those
should return. Original AbstractMatrix#viewPart again would cause problems
on iterating on fetched rows (the SparseRowMatrix again would have solved
this, I'm gonna think about it). And I wasn't sure what like(rows, columns)
should have returned. A single matrix?

>
> - set and get should not be implemented on top of viewRow.  That will kill
> performance.
>
Fixed.

>
>  - the   public MatrixSuperView(int rowSize, int columnSize, Matrix[]
> matrices){
> constructor makes no sense to me to expose to users.  It should be inlined
> and go away.
>
Gone away.

>
> - the coding style in terms of white space is erratic.  Your IDE should
> fix this.
>
Done.

>
>
>
> On Sun, Apr 14, 2013 at 4:35 AM, Gokhan Capan <gk...@gmail.com> wrote:
>
>> Ted,
>>
>> I wrote one yesterday. Basically it is a view implementing matrix, which
>> allows viewing and iterating on rows as if they are concatenated, via
>> VectorSuperView.
>>
>> Class naming can definitely change though.
>>
>> I'll change the LuceneMatrix code to return single matrix for multiple
>> fields (using this view), too.
>>
>> Could you have a look at this (only the matrix and vector views) so I
>> submit a diff (after handling labels), refactor and resubmit LuceneMatrix
>> patch, and then continue to work on Factorization Machines so it can
>> operate on a single matrix?
>>
>> The code is here (Adding exact locations for each related new class
>> because I did a kind of bad commit, from the top directory)
>>
>>
>> https://github.com/gcapan/mahout/blob/fm/math/src/main/java/org/apache/mahout/math/MatrixSuperView.java
>>
>>
>> https://github.com/gcapan/mahout/blob/fm/math/src/main/java/org/apache/mahout/math/VectorSuperView.java
>>
>>
>> https://github.com/gcapan/mahout/blob/fm/math/src/test/java/org/apache/mahout/math/MatrixSuperViewTest.java
>>
>>
>> https://github.com/gcapan/mahout/blob/fm/math/src/test/java/org/apache/mahout/math/VectorSuperViewTest.java
>>
>>
>> On Sat, Apr 13, 2013 at 10:05 AM, Ted Dunning <te...@gmail.com>wrote:
>>
>>> What would this MatrixSuperView do?  Would ConcatenatedMatrix be a
>>> better name?
>>>
>>> Sent from my iPhone
>>>
>>> On Apr 12, 2013, at 1:26, Gokhan Capan <gk...@gmail.com> wrote:
>>>
>>> > Ted,
>>> >
>>> > How about a MatrixSuperView implements Matrix? (A MatrixView like
>>> implementation)
>>> >
>>> >
>>> > On Fri, Apr 12, 2013 at 2:28 AM, Gokhan Capan <gk...@gmail.com>
>>> wrote:
>>> > So if I understood correctly, the algorithm still runs on matrix, and
>>> a client still can pass a group of matrices.
>>> >
>>> > Again it came to data preparation:)
>>> >
>>> > I will refactor the implementation to run on single matrix, but
>>> provide tools for turning the obvious client data into actual input to the
>>> algorithm.
>>> >
>>> > Sent from my iPhone
>>> >
>>> > On Apr 12, 2013, at 1:13, Ted Dunning <te...@gmail.com> wrote:
>>> >
>>> >> One easy thing to do is to build an adjoined matrix type that does
>>> the concatenation on the fly.
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On Thu, Apr 11, 2013 at 1:43 PM, Gokhan Capan <gk...@gmail.com>
>>> wrote:
>>> >> Yeah, it is simpler indeed.
>>> >>
>>> >> I am going to think about alternative ways to make concatenation
>>> easier for clients.
>>> >>
>>> >> Thanks for your review
>>> >>
>>> >>
>>> >> On Thu, Apr 11, 2013 at 10:45 PM, Robin Anil <ro...@gmail.com>
>>> wrote:
>>> >> I would have folded them all as different feature ids in a single
>>> vector, makes things a lot simpler and faster.
>>> >>
>>> >> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>>> >>
>>> >>
>>> >> On Thu, Apr 11, 2013 at 11:19 AM, Gokhan Capan <gk...@gmail.com>
>>> wrote:
>>> >> Hi Robin,
>>> >>
>>> >> If you are asking why they are arrays, it is because to save clients
>>> from concatenating multiple matrices to create the input.
>>> >>
>>> >> I am quoting from libFM paper: "For easier interpretation,
>>> >> the features are grouped into indicators for the active user (blue),
>>> active item (red), other movies rated
>>> >> by the same user (orange), the time in months (green), and the last
>>> movie rated (brown)."
>>> >>
>>> >> I thought a client would create multiple group of matrices, and he
>>> can just pass them all to the algorithm.
>>> >>
>>> >> Then the wModel is w parameters, it is still array of vectors for me
>>> to keep the indexing consistent, and vModel is the V parameters.
>>> >>
>>> >> Was that what you were asking?
>>> >>
>>> >>
>>> >> On Thu, Apr 11, 2013 at 6:44 PM, Robin Anil <ro...@gmail.com>
>>> wrote:
>>> >> Comments away. I was a bit confused by the use of Vector[] for w1 and
>>> Matrix[] for inputs.
>>> >>
>>> >> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>>> >>
>>> >>
>>> >> On Thu, Apr 11, 2013 at 10:00 AM, Gokhan Capan <gk...@gmail.com>
>>> wrote:
>>> >> Ted,
>>> >> Robin,
>>> >>
>>> >> Although I did not test on a dataset yet, recently I've been
>>> implementing Factorization Machines with SGD optimization.
>>> >>
>>> >> The initial implementation is at
>>> https://github.com/gcapan/mahout/tree/fm
>>> >>
>>> >> Would you guys consider to take a look so I can make it better and
>>> running?
>>> >>
>>> >>
>>> >>
>>> >> On Mon, Apr 1, 2013 at 8:45 PM, Nkechi Nnadi <nk...@gmail.com>
>>> wrote:
>>> >> Hello,
>>> >>
>>> >> I'm long time lurker.  I would be interested in implementing these.  I
>>> >> thought I would get my feet wet with contributing to wiki with
>>> tutorials
>>> >> since I have used Mahout for recommendation and clustering in my
>>> >> dissertation.  I have never contributed code before and I would love
>>> to
>>> >> start now.
>>> >>
>>> >> -Nkechi
>>> >>
>>> >>
>>> >> On Sun, Mar 31, 2013 at 1:14 PM, Robin Anil <ro...@gmail.com>
>>> wrote:
>>> >>
>>> >> > FMs work really well for a whole range of things. Having
>>> implemented them
>>> >> > myself, I can extend my services as a reviewer if anyone is willing
>>> to
>>> >> > start on it.
>>> >> >
>>> >> > Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>>> >> >
>>> >> >
>>> >> > On Sun, Mar 31, 2013 at 2:18 AM, Ted Dunning <ted.dunning@gmail.com
>>> >
>>> >> > wrote:
>>> >> >
>>> >> > > Relative to Dan's recent mention of SOM as possible new project,
>>> here are
>>> >> > > slides from KDD Cup 2012 in which Stephen Rendle describes how he
>>> did
>>> >> > using
>>> >> > > a very straightforward implementation of Factorization Machines
>>> [1,2].
>>> >> > >
>>> >> > >
>>> >> > > FMs are interesting in the context of Mahout because they can be
>>> used in
>>> >> > a
>>> >> > > wide variety of settings including recommendation and targeting
>>> and
>>> >> > because
>>> >> > > they have very good performance on a number of tasks.
>>> >> > >
>>> >> > > I should mention that Robin was the one who first mentioned FMs
>>> to me.
>>> >> > >
>>> >> > > The KDD 2012 competition [3] is of interest in any case because it
>>> >> > provides
>>> >> > > a large amount of realistic data for commercially important
>>> problems.
>>> >> > >
>>> >> > > [1]
>>> >> > >
>>> >> > >
>>> >> >
>>> https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/RendleSlides.pdf
>>> >> > >
>>> >> > > [2]
>>> >> > >
>>> >> > >
>>> >> >
>>> https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/Rendle.pdf
>>> >> > >
>>> >> > > [3] http://www.kddcup2012.org/
>>> >> > >
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Gokhan
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Gokhan
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Gokhan
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Gokhan
>>>
>>
>>
>>
>> --
>> Gokhan
>>
>
>


-- 
Gokhan

Re: factorization machines as new project

Posted by Ted Dunning <te...@gmail.com>.

This is a good start.  I think that there are some things that bug me in
the implementation.

- assignColumn should work the same way that viewColumn does.

- the machinery that finds the component matrix for a particular column
should be separated out as a private method.

- I think that the ColumnSizeCalculator class should go away.  You don't
need an extra object there, just a method.

- I strongly suspect that you don't need to implement VectorSuperView.
 Won't the normal handling of viewRow in AbstractMatrix work here?  Speed
may be an issue, but all speed questions should be decided by measurements.

- viewPart and like() are important.

- set and get should not be implemented on top of viewRow.  That will kill
performance.

- the   public MatrixSuperView(int rowSize, int columnSize, Matrix[]
matrices){
constructor makes no sense to me to expose to users.  It should be inlined
and go away.

- the coding style in terms of white space is erratic.  Your IDE should fix
this.



On Sun, Apr 14, 2013 at 4:35 AM, Gokhan Capan <gk...@gmail.com> wrote:

> Ted,
>
> I wrote one yesterday. Basically it is a view implementing matrix, which
> allows viewing and iterating on rows as if they are concatenated, via
> VectorSuperView.
>
> Class naming can definitely change though.
>
> I'll change the LuceneMatrix code to return single matrix for multiple
> fields (using this view), too.
>
> Could you have a look at this (only the matrix and vector views) so I
> submit a diff (after handling labels), refactor and resubmit LuceneMatrix
> patch, and then continue to work on Factorization Machines so it can
> operate on a single matrix?
>
> The code is here (Adding exact locations for each related new class
> because I did a kind of bad commit, from the top directory)
>
>
> https://github.com/gcapan/mahout/blob/fm/math/src/main/java/org/apache/mahout/math/MatrixSuperView.java
>
>
> https://github.com/gcapan/mahout/blob/fm/math/src/main/java/org/apache/mahout/math/VectorSuperView.java
>
>
> https://github.com/gcapan/mahout/blob/fm/math/src/test/java/org/apache/mahout/math/MatrixSuperViewTest.java
>
>
> https://github.com/gcapan/mahout/blob/fm/math/src/test/java/org/apache/mahout/math/VectorSuperViewTest.java
>
>
> On Sat, Apr 13, 2013 at 10:05 AM, Ted Dunning <te...@gmail.com>wrote:
>
>> What would this MatrixSuperView do?  Would ConcatenatedMatrix be a better
>> name?
>>
>> Sent from my iPhone
>>
>> On Apr 12, 2013, at 1:26, Gokhan Capan <gk...@gmail.com> wrote:
>>
>> > Ted,
>> >
>> > How about a MatrixSuperView implements Matrix? (A MatrixView like
>> implementation)
>> >
>> >
>> > On Fri, Apr 12, 2013 at 2:28 AM, Gokhan Capan <gk...@gmail.com>
>> wrote:
>> > So if I understood correctly, the algorithm still runs on matrix, and a
>> client still can pass a group of matrices.
>> >
>> > Again it came to data preparation:)
>> >
>> > I will refactor the implementation to run on single matrix, but provide
>> tools for turning the obvious client data into actual input to the
>> algorithm.
>> >
>> > Sent from my iPhone
>> >
>> > On Apr 12, 2013, at 1:13, Ted Dunning <te...@gmail.com> wrote:
>> >
>> >> One easy thing to do is to build an adjoined matrix type that does the
>> concatenation on the fly.
>> >>
>> >>
>> >>
>> >>
>> >> On Thu, Apr 11, 2013 at 1:43 PM, Gokhan Capan <gk...@gmail.com>
>> wrote:
>> >> Yeah, it is simpler indeed.
>> >>
>> >> I am going to think about alternative ways to make concatenation
>> easier for clients.
>> >>
>> >> Thanks for your review
>> >>
>> >>
>> >> On Thu, Apr 11, 2013 at 10:45 PM, Robin Anil <ro...@gmail.com>
>> wrote:
>> >> I would have folded them all as different feature ids in a single
>> vector, makes things a lot simpler and faster.
>> >>
>> >> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>> >>
>> >>
>> >> On Thu, Apr 11, 2013 at 11:19 AM, Gokhan Capan <gk...@gmail.com>
>> wrote:
>> >> Hi Robin,
>> >>
>> >> If you are asking why they are arrays, it is because to save clients
>> from concatenating multiple matrices to create the input.
>> >>
>> >> I am quoting from libFM paper: "For easier interpretation,
>> >> the features are grouped into indicators for the active user (blue),
>> active item (red), other movies rated
>> >> by the same user (orange), the time in months (green), and the last
>> movie rated (brown)."
>> >>
>> >> I thought a client would create multiple group of matrices, and he can
>> just pass them all to the algorithm.
>> >>
>> >> Then the wModel is w parameters, it is still array of vectors for me
>> to keep the indexing consistent, and vModel is the V parameters.
>> >>
>> >> Was that what you were asking?
>> >>
>> >>
>> >> On Thu, Apr 11, 2013 at 6:44 PM, Robin Anil <ro...@gmail.com>
>> wrote:
>> >> Comments away. I was a bit confused by the use of Vector[] for w1 and
>> Matrix[] for inputs.
>> >>
>> >> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>> >>
>> >>
>> >> On Thu, Apr 11, 2013 at 10:00 AM, Gokhan Capan <gk...@gmail.com>
>> wrote:
>> >> Ted,
>> >> Robin,
>> >>
>> >> Although I did not test on a dataset yet, recently I've been
>> implementing Factorization Machines with SGD optimization.
>> >>
>> >> The initial implementation is at
>> https://github.com/gcapan/mahout/tree/fm
>> >>
>> >> Would you guys consider to take a look so I can make it better and
>> running?
>> >>
>> >>
>> >>
>> >> On Mon, Apr 1, 2013 at 8:45 PM, Nkechi Nnadi <nk...@gmail.com>
>> wrote:
>> >> Hello,
>> >>
>> >> I'm long time lurker.  I would be interested in implementing these.  I
>> >> thought I would get my feet wet with contributing to wiki with
>> tutorials
>> >> since I have used Mahout for recommendation and clustering in my
>> >> dissertation.  I have never contributed code before and I would love to
>> >> start now.
>> >>
>> >> -Nkechi
>> >>
>> >>
>> >> On Sun, Mar 31, 2013 at 1:14 PM, Robin Anil <ro...@gmail.com>
>> wrote:
>> >>
>> >> > FMs work really well for a whole range of things. Having implemented
>> them
>> >> > myself, I can extend my services as a reviewer if anyone is willing
>> to
>> >> > start on it.
>> >> >
>> >> > Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>> >> >
>> >> >
>> >> > On Sun, Mar 31, 2013 at 2:18 AM, Ted Dunning <te...@gmail.com>
>> >> > wrote:
>> >> >
>> >> > > Relative to Dan's recent mention of SOM as possible new project,
>> here are
>> >> > > slides from KDD Cup 2012 in which Stephen Rendle describes how he
>> did
>> >> > using
>> >> > > a very straightforward implementation of Factorization Machines
>> [1,2].
>> >> > >
>> >> > >
>> >> > > FMs are interesting in the context of Mahout because they can be
>> used in
>> >> > a
>> >> > > wide variety of settings including recommendation and targeting and
>> >> > because
>> >> > > they have very good performance on a number of tasks.
>> >> > >
>> >> > > I should mention that Robin was the one who first mentioned FMs to
>> me.
>> >> > >
>> >> > > The KDD 2012 competition [3] is of interest in any case because it
>> >> > provides
>> >> > > a large amount of realistic data for commercially important
>> problems.
>> >> > >
>> >> > > [1]
>> >> > >
>> >> > >
>> >> >
>> https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/RendleSlides.pdf
>> >> > >
>> >> > > [2]
>> >> > >
>> >> > >
>> >> >
>> https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/Rendle.pdf
>> >> > >
>> >> > > [3] http://www.kddcup2012.org/
>> >> > >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Gokhan
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Gokhan
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Gokhan
>> >>
>> >
>> >
>> >
>> > --
>> > Gokhan
>>
>
>
>
> --
> Gokhan
>

Re: factorization machines as new project

Posted by Gokhan Capan <gk...@gmail.com>.

Ted,

I wrote one yesterday. Basically it is a view implementing matrix, which
allows viewing and iterating on rows as if they are concatenated, via
VectorSuperView.

Class naming can definitely change though.

I'll change the LuceneMatrix code to return single matrix for multiple
fields (using this view), too.

Could you have a look at this (only the matrix and vector views) so I
submit a diff (after handling labels), refactor and resubmit LuceneMatrix
patch, and then continue to work on Factorization Machines so it can
operate on a single matrix?

The code is here (Adding exact locations for each related new class because
I did a kind of bad commit, from the top directory)

https://github.com/gcapan/mahout/blob/fm/math/src/main/java/org/apache/mahout/math/MatrixSuperView.java

https://github.com/gcapan/mahout/blob/fm/math/src/main/java/org/apache/mahout/math/VectorSuperView.java

https://github.com/gcapan/mahout/blob/fm/math/src/test/java/org/apache/mahout/math/MatrixSuperViewTest.java

https://github.com/gcapan/mahout/blob/fm/math/src/test/java/org/apache/mahout/math/VectorSuperViewTest.java


On Sat, Apr 13, 2013 at 10:05 AM, Ted Dunning <te...@gmail.com> wrote:

> What would this MatrixSuperView do?  Would ConcatenatedMatrix be a better
> name?
>
> Sent from my iPhone
>
> On Apr 12, 2013, at 1:26, Gokhan Capan <gk...@gmail.com> wrote:
>
> > Ted,
> >
> > How about a MatrixSuperView implements Matrix? (A MatrixView like
> implementation)
> >
> >
> > On Fri, Apr 12, 2013 at 2:28 AM, Gokhan Capan <gk...@gmail.com> wrote:
> > So if I understood correctly, the algorithm still runs on matrix, and a
> client still can pass a group of matrices.
> >
> > Again it came to data preparation:)
> >
> > I will refactor the implementation to run on single matrix, but provide
> tools for turning the obvious client data into actual input to the
> algorithm.
> >
> > Sent from my iPhone
> >
> > On Apr 12, 2013, at 1:13, Ted Dunning <te...@gmail.com> wrote:
> >
> >> One easy thing to do is to build an adjoined matrix type that does the
> concatenation on the fly.
> >>
> >>
> >>
> >>
> >> On Thu, Apr 11, 2013 at 1:43 PM, Gokhan Capan <gk...@gmail.com>
> wrote:
> >> Yeah, it is simpler indeed.
> >>
> >> I am going to think about alternative ways to make concatenation easier
> for clients.
> >>
> >> Thanks for your review
> >>
> >>
> >> On Thu, Apr 11, 2013 at 10:45 PM, Robin Anil <ro...@gmail.com>
> wrote:
> >> I would have folded them all as different feature ids in a single
> vector, makes things a lot simpler and faster.
> >>
> >> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
> >>
> >>
> >> On Thu, Apr 11, 2013 at 11:19 AM, Gokhan Capan <gk...@gmail.com>
> wrote:
> >> Hi Robin,
> >>
> >> If you are asking why they are arrays, it is because to save clients
> from concatenating multiple matrices to create the input.
> >>
> >> I am quoting from libFM paper: "For easier interpretation,
> >> the features are grouped into indicators for the active user (blue),
> active item (red), other movies rated
> >> by the same user (orange), the time in months (green), and the last
> movie rated (brown)."
> >>
> >> I thought a client would create multiple group of matrices, and he can
> just pass them all to the algorithm.
> >>
> >> Then the wModel is w parameters, it is still array of vectors for me to
> keep the indexing consistent, and vModel is the V parameters.
> >>
> >> Was that what you were asking?
> >>
> >>
> >> On Thu, Apr 11, 2013 at 6:44 PM, Robin Anil <ro...@gmail.com>
> wrote:
> >> Comments away. I was a bit confused by the use of Vector[] for w1 and
> Matrix[] for inputs.
> >>
> >> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
> >>
> >>
> >> On Thu, Apr 11, 2013 at 10:00 AM, Gokhan Capan <gk...@gmail.com>
> wrote:
> >> Ted,
> >> Robin,
> >>
> >> Although I did not test on a dataset yet, recently I've been
> implementing Factorization Machines with SGD optimization.
> >>
> >> The initial implementation is at
> https://github.com/gcapan/mahout/tree/fm
> >>
> >> Would you guys consider to take a look so I can make it better and
> running?
> >>
> >>
> >>
> >> On Mon, Apr 1, 2013 at 8:45 PM, Nkechi Nnadi <nk...@gmail.com>
> wrote:
> >> Hello,
> >>
> >> I'm long time lurker.  I would be interested in implementing these.  I
> >> thought I would get my feet wet with contributing to wiki with tutorials
> >> since I have used Mahout for recommendation and clustering in my
> >> dissertation.  I have never contributed code before and I would love to
> >> start now.
> >>
> >> -Nkechi
> >>
> >>
> >> On Sun, Mar 31, 2013 at 1:14 PM, Robin Anil <ro...@gmail.com>
> wrote:
> >>
> >> > FMs work really well for a whole range of things. Having implemented
> them
> >> > myself, I can extend my services as a reviewer if anyone is willing to
> >> > start on it.
> >> >
> >> > Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
> >> >
> >> >
> >> > On Sun, Mar 31, 2013 at 2:18 AM, Ted Dunning <te...@gmail.com>
> >> > wrote:
> >> >
> >> > > Relative to Dan's recent mention of SOM as possible new project,
> here are
> >> > > slides from KDD Cup 2012 in which Stephen Rendle describes how he
> did
> >> > using
> >> > > a very straightforward implementation of Factorization Machines
> [1,2].
> >> > >
> >> > >
> >> > > FMs are interesting in the context of Mahout because they can be
> used in
> >> > a
> >> > > wide variety of settings including recommendation and targeting and
> >> > because
> >> > > they have very good performance on a number of tasks.
> >> > >
> >> > > I should mention that Robin was the one who first mentioned FMs to
> me.
> >> > >
> >> > > The KDD 2012 competition [3] is of interest in any case because it
> >> > provides
> >> > > a large amount of realistic data for commercially important
> problems.
> >> > >
> >> > > [1]
> >> > >
> >> > >
> >> >
> https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/RendleSlides.pdf
> >> > >
> >> > > [2]
> >> > >
> >> > >
> >> >
> https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/Rendle.pdf
> >> > >
> >> > > [3] http://www.kddcup2012.org/
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Gokhan
> >>
> >>
> >>
> >>
> >> --
> >> Gokhan
> >>
> >>
> >>
> >>
> >> --
> >> Gokhan
> >>
> >
> >
> >
> > --
> > Gokhan
>



-- 
Gokhan

Re: factorization machines as new project

Posted by Ted Dunning <te...@gmail.com>.

What would this MatrixSuperView do?  Would ConcatenatedMatrix be a better name?

Sent from my iPhone

On Apr 12, 2013, at 1:26, Gokhan Capan <gk...@gmail.com> wrote:

> Ted,
> 
> How about a MatrixSuperView implements Matrix? (A MatrixView like implementation)
> 
> 
> On Fri, Apr 12, 2013 at 2:28 AM, Gokhan Capan <gk...@gmail.com> wrote:
> So if I understood correctly, the algorithm still runs on matrix, and a client still can pass a group of matrices.
>  
> Again it came to data preparation:)
> 
> I will refactor the implementation to run on single matrix, but provide tools for turning the obvious client data into actual input to the algorithm.
> 
> Sent from my iPhone
> 
> On Apr 12, 2013, at 1:13, Ted Dunning <te...@gmail.com> wrote:
> 
>> One easy thing to do is to build an adjoined matrix type that does the concatenation on the fly.
>> 
>> 
>> 
>> 
>> On Thu, Apr 11, 2013 at 1:43 PM, Gokhan Capan <gk...@gmail.com> wrote:
>> Yeah, it is simpler indeed.
>> 
>> I am going to think about alternative ways to make concatenation easier for clients. 
>> 
>> Thanks for your review
>> 
>> 
>> On Thu, Apr 11, 2013 at 10:45 PM, Robin Anil <ro...@gmail.com> wrote:
>> I would have folded them all as different feature ids in a single vector, makes things a lot simpler and faster.
>> 
>> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>> 
>> 
>> On Thu, Apr 11, 2013 at 11:19 AM, Gokhan Capan <gk...@gmail.com> wrote:
>> Hi Robin,
>> 
>> If you are asking why they are arrays, it is because to save clients from concatenating multiple matrices to create the input.
>> 
>> I am quoting from libFM paper: "For easier interpretation,
>> the features are grouped into indicators for the active user (blue), active item (red), other movies rated
>> by the same user (orange), the time in months (green), and the last movie rated (brown)." 
>> 
>> I thought a client would create multiple group of matrices, and he can just pass them all to the algorithm.
>> 
>> Then the wModel is w parameters, it is still array of vectors for me to keep the indexing consistent, and vModel is the V parameters.
>> 
>> Was that what you were asking?
>> 
>> 
>> On Thu, Apr 11, 2013 at 6:44 PM, Robin Anil <ro...@gmail.com> wrote:
>> Comments away. I was a bit confused by the use of Vector[] for w1 and Matrix[] for inputs.
>> 
>> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>> 
>> 
>> On Thu, Apr 11, 2013 at 10:00 AM, Gokhan Capan <gk...@gmail.com> wrote:
>> Ted,
>> Robin,
>> 
>> Although I did not test on a dataset yet, recently I've been implementing Factorization Machines with SGD optimization.
>> 
>> The initial implementation is at https://github.com/gcapan/mahout/tree/fm
>> 
>> Would you guys consider to take a look so I can make it better and running?
>> 
>> 
>> 
>> On Mon, Apr 1, 2013 at 8:45 PM, Nkechi Nnadi <nk...@gmail.com> wrote:
>> Hello,
>> 
>> I'm long time lurker.  I would be interested in implementing these.  I
>> thought I would get my feet wet with contributing to wiki with tutorials
>> since I have used Mahout for recommendation and clustering in my
>> dissertation.  I have never contributed code before and I would love to
>> start now.
>> 
>> -Nkechi
>> 
>> 
>> On Sun, Mar 31, 2013 at 1:14 PM, Robin Anil <ro...@gmail.com> wrote:
>> 
>> > FMs work really well for a whole range of things. Having implemented them
>> > myself, I can extend my services as a reviewer if anyone is willing to
>> > start on it.
>> >
>> > Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>> >
>> >
>> > On Sun, Mar 31, 2013 at 2:18 AM, Ted Dunning <te...@gmail.com>
>> > wrote:
>> >
>> > > Relative to Dan's recent mention of SOM as possible new project, here are
>> > > slides from KDD Cup 2012 in which Stephen Rendle describes how he did
>> > using
>> > > a very straightforward implementation of Factorization Machines [1,2].
>> > >
>> > >
>> > > FMs are interesting in the context of Mahout because they can be used in
>> > a
>> > > wide variety of settings including recommendation and targeting and
>> > because
>> > > they have very good performance on a number of tasks.
>> > >
>> > > I should mention that Robin was the one who first mentioned FMs to me.
>> > >
>> > > The KDD 2012 competition [3] is of interest in any case because it
>> > provides
>> > > a large amount of realistic data for commercially important problems.
>> > >
>> > > [1]
>> > >
>> > >
>> > https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/RendleSlides.pdf
>> > >
>> > > [2]
>> > >
>> > >
>> > https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/Rendle.pdf
>> > >
>> > > [3] http://www.kddcup2012.org/
>> > >
>> >
>> 
>> 
>> 
>> -- 
>> Gokhan
>> 
>> 
>> 
>> 
>> -- 
>> Gokhan
>> 
>> 
>> 
>> 
>> -- 
>> Gokhan
>> 
> 
> 
> 
> -- 
> Gokhan

Re: factorization machines as new project

Posted by Gokhan Capan <gk...@gmail.com>.

Ted,

How about a *MatrixSuperView implements Matrix*? (A MatrixView like
implementation)


On Fri, Apr 12, 2013 at 2:28 AM, Gokhan Capan <gk...@gmail.com> wrote:

> So if I understood correctly, the algorithm still runs on matrix, and a
> client still can pass a group of matrices.
>
> Again it came to data preparation:)
>
> I will refactor the implementation to run on single matrix, but provide
> tools for turning the obvious client data into actual input to the
> algorithm.
>
> Sent from my iPhone
>
> On Apr 12, 2013, at 1:13, Ted Dunning <te...@gmail.com> wrote:
>
>  One easy thing to do is to build an adjoined matrix type that does the
> concatenation on the fly.
>
>
>
>
> On Thu, Apr 11, 2013 at 1:43 PM, Gokhan Capan <gk...@gmail.com> wrote:
>
>> Yeah, it is simpler indeed.
>>
>> I am going to think about alternative ways to make concatenation easier
>> for clients.
>>
>> Thanks for your review
>>
>>
>> On Thu, Apr 11, 2013 at 10:45 PM, Robin Anil <ro...@gmail.com>wrote:
>>
>>> I would have folded them all as different feature ids in a single
>>> vector, makes things a lot simpler and faster.
>>>
>>> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>>>
>>>
>>> On Thu, Apr 11, 2013 at 11:19 AM, Gokhan Capan <gk...@gmail.com>wrote:
>>>
>>>> Hi Robin,
>>>>
>>>> If you are asking why they are arrays, it is because to save clients
>>>> from concatenating multiple matrices to create the input.
>>>>
>>>> I am quoting from libFM paper<http://www.csie.ntu.edu.tw/~b97053/paper/Factorization%20Machines%20with%20libFM.pdf>:
>>>> "For easier interpretation,
>>>> the features are grouped into indicators for the active user (blue),
>>>> active item (red), other movies rated
>>>> by the same user (orange), the time in months (green), and the last
>>>> movie rated (brown)."
>>>>
>>>> I thought a client would create multiple group of matrices, and he can
>>>> just pass them all to the algorithm.
>>>>
>>>> Then the wModel is w parameters, it is still array of vectors for me to
>>>> keep the indexing consistent, and vModel is the V parameters.
>>>>
>>>> Was that what you were asking?
>>>>
>>>>
>>>> On Thu, Apr 11, 2013 at 6:44 PM, Robin Anil <ro...@gmail.com>wrote:
>>>>
>>>>> Comments away. I was a bit confused by the use of Vector[] for w1 and
>>>>> Matrix[] for inputs.
>>>>>
>>>>> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>>>>>
>>>>>
>>>>> On Thu, Apr 11, 2013 at 10:00 AM, Gokhan Capan <gk...@gmail.com>wrote:
>>>>>
>>>>>> Ted,
>>>>>> Robin,
>>>>>>
>>>>>> Although I did not test on a dataset yet, recently I've been
>>>>>> implementing Factorization Machines with SGD optimization.
>>>>>>
>>>>>> The initial implementation is at
>>>>>> https://github.com/gcapan/mahout/tree/fm
>>>>>>
>>>>>> Would you guys consider to take a look so I can make it better and
>>>>>> running?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Apr 1, 2013 at 8:45 PM, Nkechi Nnadi <nk...@gmail.com>wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I'm long time lurker.  I would be interested in implementing these.
>>>>>>>  I
>>>>>>> thought I would get my feet wet with contributing to wiki with
>>>>>>> tutorials
>>>>>>> since I have used Mahout for recommendation and clustering in my
>>>>>>> dissertation.  I have never contributed code before and I would love
>>>>>>> to
>>>>>>> start now.
>>>>>>>
>>>>>>> -Nkechi
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Mar 31, 2013 at 1:14 PM, Robin Anil <ro...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> > FMs work really well for a whole range of things. Having
>>>>>>> implemented them
>>>>>>> > myself, I can extend my services as a reviewer if anyone is
>>>>>>> willing to
>>>>>>> > start on it.
>>>>>>> >
>>>>>>> > Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>>>>>>> >
>>>>>>> >
>>>>>>> > On Sun, Mar 31, 2013 at 2:18 AM, Ted Dunning <
>>>>>>> ted.dunning@gmail.com>
>>>>>>> > wrote:
>>>>>>> >
>>>>>>> > > Relative to Dan's recent mention of SOM as possible new project,
>>>>>>> here are
>>>>>>> > > slides from KDD Cup 2012 in which Stephen Rendle describes how
>>>>>>> he did
>>>>>>> > using
>>>>>>> > > a very straightforward implementation of Factorization Machines
>>>>>>> [1,2].
>>>>>>> > >
>>>>>>> > >
>>>>>>> > > FMs are interesting in the context of Mahout because they can be
>>>>>>> used in
>>>>>>> > a
>>>>>>> > > wide variety of settings including recommendation and targeting
>>>>>>> and
>>>>>>> > because
>>>>>>> > > they have very good performance on a number of tasks.
>>>>>>> > >
>>>>>>> > > I should mention that Robin was the one who first mentioned FMs
>>>>>>> to me.
>>>>>>> > >
>>>>>>> > > The KDD 2012 competition [3] is of interest in any case because
>>>>>>> it
>>>>>>> > provides
>>>>>>> > > a large amount of realistic data for commercially important
>>>>>>> problems.
>>>>>>> > >
>>>>>>> > > [1]
>>>>>>> > >
>>>>>>> > >
>>>>>>> >
>>>>>>> https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/RendleSlides.pdf
>>>>>>> > >
>>>>>>> > > [2]
>>>>>>> > >
>>>>>>> > >
>>>>>>> >
>>>>>>> https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/Rendle.pdf
>>>>>>> > >
>>>>>>> > > [3] http://www.kddcup2012.org/
>>>>>>> > >
>>>>>>> >
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Gokhan
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Gokhan
>>>>
>>>
>>>
>>
>>
>> --
>> Gokhan
>>
>
>


-- 
Gokhan

Re: factorization machines as new project

Posted by Gokhan Capan <gk...@gmail.com>.

So if I understood correctly, the algorithm still runs on matrix, and a
client still can pass a group of matrices.

Again it came to data preparation:)

I will refactor the implementation to run on single matrix, but provide
tools for turning the obvious client data into actual input to the
algorithm.

Sent from my iPhone

On Apr 12, 2013, at 1:13, Ted Dunning <te...@gmail.com> wrote:

One easy thing to do is to build an adjoined matrix type that does the
concatenation on the fly.




On Thu, Apr 11, 2013 at 1:43 PM, Gokhan Capan <gk...@gmail.com> wrote:

> Yeah, it is simpler indeed.
>
> I am going to think about alternative ways to make concatenation easier
> for clients.
>
> Thanks for your review
>
>
> On Thu, Apr 11, 2013 at 10:45 PM, Robin Anil <ro...@gmail.com> wrote:
>
>> I would have folded them all as different feature ids in a single vector,
>> makes things a lot simpler and faster.
>>
>> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>>
>>
>> On Thu, Apr 11, 2013 at 11:19 AM, Gokhan Capan <gk...@gmail.com> wrote:
>>
>>> Hi Robin,
>>>
>>> If you are asking why they are arrays, it is because to save clients
>>> from concatenating multiple matrices to create the input.
>>>
>>> I am quoting from libFM paper<http://www.csie.ntu.edu.tw/~b97053/paper/Factorization%20Machines%20with%20libFM.pdf>:
>>> "For easier interpretation,
>>> the features are grouped into indicators for the active user (blue),
>>> active item (red), other movies rated
>>> by the same user (orange), the time in months (green), and the last
>>> movie rated (brown)."
>>>
>>> I thought a client would create multiple group of matrices, and he can
>>> just pass them all to the algorithm.
>>>
>>> Then the wModel is w parameters, it is still array of vectors for me to
>>> keep the indexing consistent, and vModel is the V parameters.
>>>
>>> Was that what you were asking?
>>>
>>>
>>> On Thu, Apr 11, 2013 at 6:44 PM, Robin Anil <ro...@gmail.com>wrote:
>>>
>>>> Comments away. I was a bit confused by the use of Vector[] for w1 and
>>>> Matrix[] for inputs.
>>>>
>>>> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>>>>
>>>>
>>>> On Thu, Apr 11, 2013 at 10:00 AM, Gokhan Capan <gk...@gmail.com>wrote:
>>>>
>>>>> Ted,
>>>>> Robin,
>>>>>
>>>>> Although I did not test on a dataset yet, recently I've been
>>>>> implementing Factorization Machines with SGD optimization.
>>>>>
>>>>> The initial implementation is at
>>>>> https://github.com/gcapan/mahout/tree/fm
>>>>>
>>>>> Would you guys consider to take a look so I can make it better and
>>>>> running?
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Apr 1, 2013 at 8:45 PM, Nkechi Nnadi <nk...@gmail.com>wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I'm long time lurker.  I would be interested in implementing these.  I
>>>>>> thought I would get my feet wet with contributing to wiki with
>>>>>> tutorials
>>>>>> since I have used Mahout for recommendation and clustering in my
>>>>>> dissertation.  I have never contributed code before and I would love
>>>>>> to
>>>>>> start now.
>>>>>>
>>>>>> -Nkechi
>>>>>>
>>>>>>
>>>>>> On Sun, Mar 31, 2013 at 1:14 PM, Robin Anil <ro...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> > FMs work really well for a whole range of things. Having
>>>>>> implemented them
>>>>>> > myself, I can extend my services as a reviewer if anyone is willing
>>>>>> to
>>>>>> > start on it.
>>>>>> >
>>>>>> > Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>>>>>> >
>>>>>> >
>>>>>> > On Sun, Mar 31, 2013 at 2:18 AM, Ted Dunning <ted.dunning@gmail.com
>>>>>> >
>>>>>> > wrote:
>>>>>> >
>>>>>> > > Relative to Dan's recent mention of SOM as possible new project,
>>>>>> here are
>>>>>> > > slides from KDD Cup 2012 in which Stephen Rendle describes how he
>>>>>> did
>>>>>> > using
>>>>>> > > a very straightforward implementation of Factorization Machines
>>>>>> [1,2].
>>>>>> > >
>>>>>> > >
>>>>>> > > FMs are interesting in the context of Mahout because they can be
>>>>>> used in
>>>>>> > a
>>>>>> > > wide variety of settings including recommendation and targeting
>>>>>> and
>>>>>> > because
>>>>>> > > they have very good performance on a number of tasks.
>>>>>> > >
>>>>>> > > I should mention that Robin was the one who first mentioned FMs
>>>>>> to me.
>>>>>> > >
>>>>>> > > The KDD 2012 competition [3] is of interest in any case because it
>>>>>> > provides
>>>>>> > > a large amount of realistic data for commercially important
>>>>>> problems.
>>>>>> > >
>>>>>> > > [1]
>>>>>> > >
>>>>>> > >
>>>>>> >
>>>>>> https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/RendleSlides.pdf
>>>>>> > >
>>>>>> > > [2]
>>>>>> > >
>>>>>> > >
>>>>>> >
>>>>>> https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/Rendle.pdf
>>>>>> > >
>>>>>> > > [3] http://www.kddcup2012.org/
>>>>>> > >
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Gokhan
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Gokhan
>>>
>>
>>
>
>
> --
> Gokhan
>

Re: factorization machines as new project

Posted by Ted Dunning <te...@gmail.com>.

One easy thing to do is to build an adjoined matrix type that does the
concatenation on the fly.




On Thu, Apr 11, 2013 at 1:43 PM, Gokhan Capan <gk...@gmail.com> wrote:

> Yeah, it is simpler indeed.
>
> I am going to think about alternative ways to make concatenation easier
> for clients.
>
> Thanks for your review
>
>
> On Thu, Apr 11, 2013 at 10:45 PM, Robin Anil <ro...@gmail.com> wrote:
>
>> I would have folded them all as different feature ids in a single vector,
>> makes things a lot simpler and faster.
>>
>> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>>
>>
>> On Thu, Apr 11, 2013 at 11:19 AM, Gokhan Capan <gk...@gmail.com> wrote:
>>
>>> Hi Robin,
>>>
>>> If you are asking why they are arrays, it is because to save clients
>>> from concatenating multiple matrices to create the input.
>>>
>>> I am quoting from libFM paper<http://www.csie.ntu.edu.tw/~b97053/paper/Factorization%20Machines%20with%20libFM.pdf>:
>>> "For easier interpretation,
>>> the features are grouped into indicators for the active user (blue),
>>> active item (red), other movies rated
>>> by the same user (orange), the time in months (green), and the last
>>> movie rated (brown)."
>>>
>>> I thought a client would create multiple group of matrices, and he can
>>> just pass them all to the algorithm.
>>>
>>> Then the wModel is w parameters, it is still array of vectors for me to
>>> keep the indexing consistent, and vModel is the V parameters.
>>>
>>> Was that what you were asking?
>>>
>>>
>>> On Thu, Apr 11, 2013 at 6:44 PM, Robin Anil <ro...@gmail.com>wrote:
>>>
>>>> Comments away. I was a bit confused by the use of Vector[] for w1 and
>>>> Matrix[] for inputs.
>>>>
>>>> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>>>>
>>>>
>>>> On Thu, Apr 11, 2013 at 10:00 AM, Gokhan Capan <gk...@gmail.com>wrote:
>>>>
>>>>> Ted,
>>>>> Robin,
>>>>>
>>>>> Although I did not test on a dataset yet, recently I've been
>>>>> implementing Factorization Machines with SGD optimization.
>>>>>
>>>>> The initial implementation is at
>>>>> https://github.com/gcapan/mahout/tree/fm
>>>>>
>>>>> Would you guys consider to take a look so I can make it better and
>>>>> running?
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Apr 1, 2013 at 8:45 PM, Nkechi Nnadi <nk...@gmail.com>wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I'm long time lurker.  I would be interested in implementing these.  I
>>>>>> thought I would get my feet wet with contributing to wiki with
>>>>>> tutorials
>>>>>> since I have used Mahout for recommendation and clustering in my
>>>>>> dissertation.  I have never contributed code before and I would love
>>>>>> to
>>>>>> start now.
>>>>>>
>>>>>> -Nkechi
>>>>>>
>>>>>>
>>>>>> On Sun, Mar 31, 2013 at 1:14 PM, Robin Anil <ro...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> > FMs work really well for a whole range of things. Having
>>>>>> implemented them
>>>>>> > myself, I can extend my services as a reviewer if anyone is willing
>>>>>> to
>>>>>> > start on it.
>>>>>> >
>>>>>> > Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>>>>>> >
>>>>>> >
>>>>>> > On Sun, Mar 31, 2013 at 2:18 AM, Ted Dunning <ted.dunning@gmail.com
>>>>>> >
>>>>>> > wrote:
>>>>>> >
>>>>>> > > Relative to Dan's recent mention of SOM as possible new project,
>>>>>> here are
>>>>>> > > slides from KDD Cup 2012 in which Stephen Rendle describes how he
>>>>>> did
>>>>>> > using
>>>>>> > > a very straightforward implementation of Factorization Machines
>>>>>> [1,2].
>>>>>> > >
>>>>>> > >
>>>>>> > > FMs are interesting in the context of Mahout because they can be
>>>>>> used in
>>>>>> > a
>>>>>> > > wide variety of settings including recommendation and targeting
>>>>>> and
>>>>>> > because
>>>>>> > > they have very good performance on a number of tasks.
>>>>>> > >
>>>>>> > > I should mention that Robin was the one who first mentioned FMs
>>>>>> to me.
>>>>>> > >
>>>>>> > > The KDD 2012 competition [3] is of interest in any case because it
>>>>>> > provides
>>>>>> > > a large amount of realistic data for commercially important
>>>>>> problems.
>>>>>> > >
>>>>>> > > [1]
>>>>>> > >
>>>>>> > >
>>>>>> >
>>>>>> https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/RendleSlides.pdf
>>>>>> > >
>>>>>> > > [2]
>>>>>> > >
>>>>>> > >
>>>>>> >
>>>>>> https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/Rendle.pdf
>>>>>> > >
>>>>>> > > [3] http://www.kddcup2012.org/
>>>>>> > >
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Gokhan
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Gokhan
>>>
>>
>>
>
>
> --
> Gokhan
>

Re: factorization machines as new project

Posted by Gokhan Capan <gk...@gmail.com>.

Yeah, it is simpler indeed.

I am going to think about alternative ways to make concatenation easier for
clients.

Thanks for your review


On Thu, Apr 11, 2013 at 10:45 PM, Robin Anil <ro...@gmail.com> wrote:

> I would have folded them all as different feature ids in a single vector,
> makes things a lot simpler and faster.
>
> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>
>
> On Thu, Apr 11, 2013 at 11:19 AM, Gokhan Capan <gk...@gmail.com> wrote:
>
>> Hi Robin,
>>
>> If you are asking why they are arrays, it is because to save clients from
>> concatenating multiple matrices to create the input.
>>
>> I am quoting from libFM paper<http://www.csie.ntu.edu.tw/~b97053/paper/Factorization%20Machines%20with%20libFM.pdf>:
>> "For easier interpretation,
>> the features are grouped into indicators for the active user (blue),
>> active item (red), other movies rated
>> by the same user (orange), the time in months (green), and the last movie
>> rated (brown)."
>>
>> I thought a client would create multiple group of matrices, and he can
>> just pass them all to the algorithm.
>>
>> Then the wModel is w parameters, it is still array of vectors for me to
>> keep the indexing consistent, and vModel is the V parameters.
>>
>> Was that what you were asking?
>>
>>
>> On Thu, Apr 11, 2013 at 6:44 PM, Robin Anil <ro...@gmail.com> wrote:
>>
>>> Comments away. I was a bit confused by the use of Vector[] for w1 and
>>> Matrix[] for inputs.
>>>
>>> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>>>
>>>
>>> On Thu, Apr 11, 2013 at 10:00 AM, Gokhan Capan <gk...@gmail.com>wrote:
>>>
>>>> Ted,
>>>> Robin,
>>>>
>>>> Although I did not test on a dataset yet, recently I've been
>>>> implementing Factorization Machines with SGD optimization.
>>>>
>>>> The initial implementation is at
>>>> https://github.com/gcapan/mahout/tree/fm
>>>>
>>>> Would you guys consider to take a look so I can make it better and
>>>> running?
>>>>
>>>>
>>>>
>>>> On Mon, Apr 1, 2013 at 8:45 PM, Nkechi Nnadi <nk...@gmail.com>wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I'm long time lurker.  I would be interested in implementing these.  I
>>>>> thought I would get my feet wet with contributing to wiki with
>>>>> tutorials
>>>>> since I have used Mahout for recommendation and clustering in my
>>>>> dissertation.  I have never contributed code before and I would love to
>>>>> start now.
>>>>>
>>>>> -Nkechi
>>>>>
>>>>>
>>>>> On Sun, Mar 31, 2013 at 1:14 PM, Robin Anil <ro...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> > FMs work really well for a whole range of things. Having implemented
>>>>> them
>>>>> > myself, I can extend my services as a reviewer if anyone is willing
>>>>> to
>>>>> > start on it.
>>>>> >
>>>>> > Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>>>>> >
>>>>> >
>>>>> > On Sun, Mar 31, 2013 at 2:18 AM, Ted Dunning <te...@gmail.com>
>>>>> > wrote:
>>>>> >
>>>>> > > Relative to Dan's recent mention of SOM as possible new project,
>>>>> here are
>>>>> > > slides from KDD Cup 2012 in which Stephen Rendle describes how he
>>>>> did
>>>>> > using
>>>>> > > a very straightforward implementation of Factorization Machines
>>>>> [1,2].
>>>>> > >
>>>>> > >
>>>>> > > FMs are interesting in the context of Mahout because they can be
>>>>> used in
>>>>> > a
>>>>> > > wide variety of settings including recommendation and targeting and
>>>>> > because
>>>>> > > they have very good performance on a number of tasks.
>>>>> > >
>>>>> > > I should mention that Robin was the one who first mentioned FMs to
>>>>> me.
>>>>> > >
>>>>> > > The KDD 2012 competition [3] is of interest in any case because it
>>>>> > provides
>>>>> > > a large amount of realistic data for commercially important
>>>>> problems.
>>>>> > >
>>>>> > > [1]
>>>>> > >
>>>>> > >
>>>>> >
>>>>> https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/RendleSlides.pdf
>>>>> > >
>>>>> > > [2]
>>>>> > >
>>>>> > >
>>>>> >
>>>>> https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/Rendle.pdf
>>>>> > >
>>>>> > > [3] http://www.kddcup2012.org/
>>>>> > >
>>>>> >
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Gokhan
>>>>
>>>
>>>
>>
>>
>> --
>> Gokhan
>>
>
>


-- 
Gokhan

Re: factorization machines as new project

Posted by Robin Anil <ro...@gmail.com>.

I would have folded them all as different feature ids in a single vector,
makes things a lot simpler and faster.

Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.


On Thu, Apr 11, 2013 at 11:19 AM, Gokhan Capan <gk...@gmail.com> wrote:

> Hi Robin,
>
> If you are asking why they are arrays, it is because to save clients from
> concatenating multiple matrices to create the input.
>
> I am quoting from libFM paper<http://www.csie.ntu.edu.tw/~b97053/paper/Factorization%20Machines%20with%20libFM.pdf>:
> "For easier interpretation,
> the features are grouped into indicators for the active user (blue),
> active item (red), other movies rated
> by the same user (orange), the time in months (green), and the last movie
> rated (brown)."
>
> I thought a client would create multiple group of matrices, and he can
> just pass them all to the algorithm.
>
> Then the wModel is w parameters, it is still array of vectors for me to
> keep the indexing consistent, and vModel is the V parameters.
>
> Was that what you were asking?
>
>
> On Thu, Apr 11, 2013 at 6:44 PM, Robin Anil <ro...@gmail.com> wrote:
>
>> Comments away. I was a bit confused by the use of Vector[] for w1 and
>> Matrix[] for inputs.
>>
>> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>>
>>
>> On Thu, Apr 11, 2013 at 10:00 AM, Gokhan Capan <gk...@gmail.com> wrote:
>>
>>> Ted,
>>> Robin,
>>>
>>> Although I did not test on a dataset yet, recently I've been
>>> implementing Factorization Machines with SGD optimization.
>>>
>>> The initial implementation is at
>>> https://github.com/gcapan/mahout/tree/fm
>>>
>>> Would you guys consider to take a look so I can make it better and
>>> running?
>>>
>>>
>>>
>>> On Mon, Apr 1, 2013 at 8:45 PM, Nkechi Nnadi <nk...@gmail.com>wrote:
>>>
>>>> Hello,
>>>>
>>>> I'm long time lurker.  I would be interested in implementing these.  I
>>>> thought I would get my feet wet with contributing to wiki with tutorials
>>>> since I have used Mahout for recommendation and clustering in my
>>>> dissertation.  I have never contributed code before and I would love to
>>>> start now.
>>>>
>>>> -Nkechi
>>>>
>>>>
>>>> On Sun, Mar 31, 2013 at 1:14 PM, Robin Anil <ro...@gmail.com>
>>>> wrote:
>>>>
>>>> > FMs work really well for a whole range of things. Having implemented
>>>> them
>>>> > myself, I can extend my services as a reviewer if anyone is willing to
>>>> > start on it.
>>>> >
>>>> > Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>>>> >
>>>> >
>>>> > On Sun, Mar 31, 2013 at 2:18 AM, Ted Dunning <te...@gmail.com>
>>>> > wrote:
>>>> >
>>>> > > Relative to Dan's recent mention of SOM as possible new project,
>>>> here are
>>>> > > slides from KDD Cup 2012 in which Stephen Rendle describes how he
>>>> did
>>>> > using
>>>> > > a very straightforward implementation of Factorization Machines
>>>> [1,2].
>>>> > >
>>>> > >
>>>> > > FMs are interesting in the context of Mahout because they can be
>>>> used in
>>>> > a
>>>> > > wide variety of settings including recommendation and targeting and
>>>> > because
>>>> > > they have very good performance on a number of tasks.
>>>> > >
>>>> > > I should mention that Robin was the one who first mentioned FMs to
>>>> me.
>>>> > >
>>>> > > The KDD 2012 competition [3] is of interest in any case because it
>>>> > provides
>>>> > > a large amount of realistic data for commercially important
>>>> problems.
>>>> > >
>>>> > > [1]
>>>> > >
>>>> > >
>>>> >
>>>> https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/RendleSlides.pdf
>>>> > >
>>>> > > [2]
>>>> > >
>>>> > >
>>>> >
>>>> https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/Rendle.pdf
>>>> > >
>>>> > > [3] http://www.kddcup2012.org/
>>>> > >
>>>> >
>>>>
>>>
>>>
>>>
>>> --
>>> Gokhan
>>>
>>
>>
>
>
> --
> Gokhan
>

Re: factorization machines as new project

Posted by Gokhan Capan <gk...@gmail.com>.

Hi Robin,

If you are asking why they are arrays, it is because to save clients from
concatenating multiple matrices to create the input.

I am quoting from libFM
paper<http://www.csie.ntu.edu.tw/~b97053/paper/Factorization%20Machines%20with%20libFM.pdf>:
"For easier interpretation,
the features are grouped into indicators for the active user (blue), active
item (red), other movies rated
by the same user (orange), the time in months (green), and the last movie
rated (brown)."

I thought a client would create multiple group of matrices, and he can just
pass them all to the algorithm.

Then the wModel is w parameters, it is still array of vectors for me to
keep the indexing consistent, and vModel is the V parameters.

Was that what you were asking?


On Thu, Apr 11, 2013 at 6:44 PM, Robin Anil <ro...@gmail.com> wrote:

> Comments away. I was a bit confused by the use of Vector[] for w1 and
> Matrix[] for inputs.
>
> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>
>
> On Thu, Apr 11, 2013 at 10:00 AM, Gokhan Capan <gk...@gmail.com> wrote:
>
>> Ted,
>> Robin,
>>
>> Although I did not test on a dataset yet, recently I've been implementing
>> Factorization Machines with SGD optimization.
>>
>> The initial implementation is at https://github.com/gcapan/mahout/tree/fm
>>
>> Would you guys consider to take a look so I can make it better and
>> running?
>>
>>
>>
>> On Mon, Apr 1, 2013 at 8:45 PM, Nkechi Nnadi <nk...@gmail.com>wrote:
>>
>>> Hello,
>>>
>>> I'm long time lurker.  I would be interested in implementing these.  I
>>> thought I would get my feet wet with contributing to wiki with tutorials
>>> since I have used Mahout for recommendation and clustering in my
>>> dissertation.  I have never contributed code before and I would love to
>>> start now.
>>>
>>> -Nkechi
>>>
>>>
>>> On Sun, Mar 31, 2013 at 1:14 PM, Robin Anil <ro...@gmail.com>
>>> wrote:
>>>
>>> > FMs work really well for a whole range of things. Having implemented
>>> them
>>> > myself, I can extend my services as a reviewer if anyone is willing to
>>> > start on it.
>>> >
>>> > Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>>> >
>>> >
>>> > On Sun, Mar 31, 2013 at 2:18 AM, Ted Dunning <te...@gmail.com>
>>> > wrote:
>>> >
>>> > > Relative to Dan's recent mention of SOM as possible new project,
>>> here are
>>> > > slides from KDD Cup 2012 in which Stephen Rendle describes how he did
>>> > using
>>> > > a very straightforward implementation of Factorization Machines
>>> [1,2].
>>> > >
>>> > >
>>> > > FMs are interesting in the context of Mahout because they can be
>>> used in
>>> > a
>>> > > wide variety of settings including recommendation and targeting and
>>> > because
>>> > > they have very good performance on a number of tasks.
>>> > >
>>> > > I should mention that Robin was the one who first mentioned FMs to
>>> me.
>>> > >
>>> > > The KDD 2012 competition [3] is of interest in any case because it
>>> > provides
>>> > > a large amount of realistic data for commercially important problems.
>>> > >
>>> > > [1]
>>> > >
>>> > >
>>> >
>>> https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/RendleSlides.pdf
>>> > >
>>> > > [2]
>>> > >
>>> > >
>>> >
>>> https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/Rendle.pdf
>>> > >
>>> > > [3] http://www.kddcup2012.org/
>>> > >
>>> >
>>>
>>
>>
>>
>> --
>> Gokhan
>>
>
>


-- 
Gokhan

Re: factorization machines as new project

Posted by Robin Anil <ro...@gmail.com>.

Comments away. I was a bit confused by the use of Vector[] for w1 and
Matrix[] for inputs.

Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.


On Thu, Apr 11, 2013 at 10:00 AM, Gokhan Capan <gk...@gmail.com> wrote:

> Ted,
> Robin,
>
> Although I did not test on a dataset yet, recently I've been implementing
> Factorization Machines with SGD optimization.
>
> The initial implementation is at https://github.com/gcapan/mahout/tree/fm
>
> Would you guys consider to take a look so I can make it better and running?
>
>
>
> On Mon, Apr 1, 2013 at 8:45 PM, Nkechi Nnadi <nk...@gmail.com>wrote:
>
>> Hello,
>>
>> I'm long time lurker.  I would be interested in implementing these.  I
>> thought I would get my feet wet with contributing to wiki with tutorials
>> since I have used Mahout for recommendation and clustering in my
>> dissertation.  I have never contributed code before and I would love to
>> start now.
>>
>> -Nkechi
>>
>>
>> On Sun, Mar 31, 2013 at 1:14 PM, Robin Anil <ro...@gmail.com> wrote:
>>
>> > FMs work really well for a whole range of things. Having implemented
>> them
>> > myself, I can extend my services as a reviewer if anyone is willing to
>> > start on it.
>> >
>> > Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>> >
>> >
>> > On Sun, Mar 31, 2013 at 2:18 AM, Ted Dunning <te...@gmail.com>
>> > wrote:
>> >
>> > > Relative to Dan's recent mention of SOM as possible new project, here
>> are
>> > > slides from KDD Cup 2012 in which Stephen Rendle describes how he did
>> > using
>> > > a very straightforward implementation of Factorization Machines [1,2].
>> > >
>> > >
>> > > FMs are interesting in the context of Mahout because they can be used
>> in
>> > a
>> > > wide variety of settings including recommendation and targeting and
>> > because
>> > > they have very good performance on a number of tasks.
>> > >
>> > > I should mention that Robin was the one who first mentioned FMs to me.
>> > >
>> > > The KDD 2012 competition [3] is of interest in any case because it
>> > provides
>> > > a large amount of realistic data for commercially important problems.
>> > >
>> > > [1]
>> > >
>> > >
>> >
>> https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/RendleSlides.pdf
>> > >
>> > > [2]
>> > >
>> > >
>> >
>> https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/Rendle.pdf
>> > >
>> > > [3] http://www.kddcup2012.org/
>> > >
>> >
>>
>
>
>
> --
> Gokhan
>

Re: factorization machines as new project

Posted by Gokhan Capan <gk...@gmail.com>.

Ted,
Robin,

Although I did not test on a dataset yet, recently I've been implementing
Factorization Machines with SGD optimization.

The initial implementation is at https://github.com/gcapan/mahout/tree/fm

Would you guys consider to take a look so I can make it better and running?



On Mon, Apr 1, 2013 at 8:45 PM, Nkechi Nnadi <nk...@gmail.com> wrote:

> Hello,
>
> I'm long time lurker.  I would be interested in implementing these.  I
> thought I would get my feet wet with contributing to wiki with tutorials
> since I have used Mahout for recommendation and clustering in my
> dissertation.  I have never contributed code before and I would love to
> start now.
>
> -Nkechi
>
>
> On Sun, Mar 31, 2013 at 1:14 PM, Robin Anil <ro...@gmail.com> wrote:
>
> > FMs work really well for a whole range of things. Having implemented them
> > myself, I can extend my services as a reviewer if anyone is willing to
> > start on it.
> >
> > Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
> >
> >
> > On Sun, Mar 31, 2013 at 2:18 AM, Ted Dunning <te...@gmail.com>
> > wrote:
> >
> > > Relative to Dan's recent mention of SOM as possible new project, here
> are
> > > slides from KDD Cup 2012 in which Stephen Rendle describes how he did
> > using
> > > a very straightforward implementation of Factorization Machines [1,2].
> > >
> > >
> > > FMs are interesting in the context of Mahout because they can be used
> in
> > a
> > > wide variety of settings including recommendation and targeting and
> > because
> > > they have very good performance on a number of tasks.
> > >
> > > I should mention that Robin was the one who first mentioned FMs to me.
> > >
> > > The KDD 2012 competition [3] is of interest in any case because it
> > provides
> > > a large amount of realistic data for commercially important problems.
> > >
> > > [1]
> > >
> > >
> >
> https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/RendleSlides.pdf
> > >
> > > [2]
> > >
> > >
> >
> https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/Rendle.pdf
> > >
> > > [3] http://www.kddcup2012.org/
> > >
> >
>



-- 
Gokhan