You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by 梁明强 <mq...@gmail.com> on 2014/12/23 07:00:27 UTC

code addition proposal: Matrix Factorization(MF) method

Dear sir,

I read the source code of Mahout, it seems hasn’t Matrix Factorization(MF) method.

MF is very useful in recommender system, especially when the dataset is very sparse. Even though SVD and QR decomposition methods are successfully applied to solve some practical problems, it suffers severe overfitting when applied to sparse matrices. 

In fact, Matrix Factorization is very different to Matrix Decomposition, although they look very similar. Simply put, Matrix Decomposition is used to reduce dimension, and the original matrix need to be dense; but Matrix Factorization is usually used to predict users' rating on some items when the original matrix is very sparse.

So, I want to implement some Matrix Factorization method for Mahout, such as Non-negative Matrix Factorization(NMF) and Probabilistic Matrix Factorization(PMF).

If my proposal sounds interesting, do we need establish a new git branch, or just fork and pull request?

And here is my elementary implementation of NMF in java，using multiplicative update rules. In the next step I want to implement PMF, optimize them and implement the distribution version.


package org.apache.mahout.math;

import java.util.Random;
import org.apache.mahout.math.function.DoubleFunction;
public class NMF{
    private Matrix w;
    private Matrix h;

    /**
     * @param v Matrix original
     * @param r model order
     * @param steps max steps before converge
     * @param errMax threshold of object function change rate
     */
    public NMF(Matrix v, int r, int steps, double errMax) {
        double oldObj, obj;
        int n = v.rowSize();
        int m = v.columnSize();

        w = new DenseMatrix(n, r).assign(RANDOMF);
        h = new DenseMatrix(r, m).assign(RANDOMF);

        for (int step = 0; step < steps; step++) {

            oldObj = calObjectFunction(v, w, h);

            //update h
            Matrix wtv = wt.times(v);
            Matrix wtwh = wt.times(wh);
            for (int i = 0; i < r; i++) {
                for (int j = 0; j < m; j++) {
                    if(wtwh.get(i, j) > 0)
                        h.set(i, j, h.get(i,j)*
                            (wtv.get(i,j)/wtwh.get(i,j)) );
                }
            }

	    // update w
            Matrix vht = v.times(h.transpose());
            Matrix whht = w.times(h).times(h.transpose());
            for (int i = 0; i < n; i++) {
                for (int j = 0; j < r; j++) {
                    if(whht.get(i, j) > 0)
                        w.set(i, j, w.get(i,j)*
                            (vht.get(i,j)/whht.get(i,j)) );
                    
                }
            }


            obj = calObjectFunction(v, w, h);
            double erro = oldObj - obj;
            if(erro < errMax) break;

        }

    }

    /**
     * Calculate the value of object function
     * @param v Matrix original
     * @param w Matrix factor
     * @param h Matrix factor
     * @return Value of object function
     */
    static double calObjectFunction(Matrix v, Matrix w, Matrix h){
        Matrix wh = w.times(h);
        Matrix minus = v.minus(wh);
        double err = 0;
        for(int i=0; i<minus.rowSize(); i++)
            for(int j=0; j<minus.columnSize(); j++)
                err += minus.get(i, j) * minus.get(i, j);
        return err;
    }


    private static final DoubleFunction RANDOMF = new DoubleFunction() {
        @Override
        public double apply(double a) {
            Random random = new Random();
            return random.nextDouble();
        }
    };

    public Matrix getW() {
        return w;
    }

    public Matrix getH() {
        return h;
    }


}

Re: code addition proposal: Matrix Factorization(MF) method

Posted by Andrew Musselman <an...@gmail.com>.

I'm cool with an NMF/PMF addition; you can make pull requests for further comments.

> On Dec 22, 2014, at 10:00 PM, 梁明强 <mq...@gmail.com> wrote:
> 
> Dear sir,
> 
> I read the source code of Mahout, it seems hasn’t Matrix Factorization(MF) method.
> 
> MF is very useful in recommender system, especially when the dataset is very sparse. Even though SVD and QR decomposition methods are successfully applied to solve some practical problems, it suffers severe overfitting when applied to sparse matrices. 
> 
> In fact, Matrix Factorization is very different to Matrix Decomposition, although they look very similar. Simply put, Matrix Decomposition is used to reduce dimension, and the original matrix need to be dense; but Matrix Factorization is usually used to predict users' rating on some items when the original matrix is very sparse.
> 
> So, I want to implement some Matrix Factorization method for Mahout, such as Non-negative Matrix Factorization(NMF) and Probabilistic Matrix Factorization(PMF).
> 
> If my proposal sounds interesting, do we need establish a new git branch, or just fork and pull request?
> 
> And here is my elementary implementation of NMF in java，using multiplicative update rules. In the next step I want to implement PMF, optimize them and implement the distribution version.
> 
> 
> package org.apache.mahout.math;
> 
> import java.util.Random;
> import org.apache.mahout.math.function.DoubleFunction;
> public class NMF{
>    private Matrix w;
>    private Matrix h;
> 
>    /**
>     * @param v Matrix original
>     * @param r model order
>     * @param steps max steps before converge
>     * @param errMax threshold of object function change rate
>     */
>    public NMF(Matrix v, int r, int steps, double errMax) {
>        double oldObj, obj;
>        int n = v.rowSize();
>        int m = v.columnSize();
> 
>        w = new DenseMatrix(n, r).assign(RANDOMF);
>        h = new DenseMatrix(r, m).assign(RANDOMF);
> 
>        for (int step = 0; step < steps; step++) {
> 
>            oldObj = calObjectFunction(v, w, h);
> 
>            //update h
>            Matrix wtv = wt.times(v);
>            Matrix wtwh = wt.times(wh);
>            for (int i = 0; i < r; i++) {
>                for (int j = 0; j < m; j++) {
>                    if(wtwh.get(i, j) > 0)
>                        h.set(i, j, h.get(i,j)*
>                            (wtv.get(i,j)/wtwh.get(i,j)) );
>                }
>            }
> 
>        // update w
>            Matrix vht = v.times(h.transpose());
>            Matrix whht = w.times(h).times(h.transpose());
>            for (int i = 0; i < n; i++) {
>                for (int j = 0; j < r; j++) {
>                    if(whht.get(i, j) > 0)
>                        w.set(i, j, w.get(i,j)*
>                            (vht.get(i,j)/whht.get(i,j)) );
> 
>                }
>            }
> 
> 
>            obj = calObjectFunction(v, w, h);
>            double erro = oldObj - obj;
>            if(erro < errMax) break;
> 
>        }
> 
>    }
> 
>    /**
>     * Calculate the value of object function
>     * @param v Matrix original
>     * @param w Matrix factor
>     * @param h Matrix factor
>     * @return Value of object function
>     */
>    static double calObjectFunction(Matrix v, Matrix w, Matrix h){
>        Matrix wh = w.times(h);
>        Matrix minus = v.minus(wh);
>        double err = 0;
>        for(int i=0; i<minus.rowSize(); i++)
>            for(int j=0; j<minus.columnSize(); j++)
>                err += minus.get(i, j) * minus.get(i, j);
>        return err;
>    }
> 
> 
>    private static final DoubleFunction RANDOMF = new DoubleFunction() {
>        @Override
>        public double apply(double a) {
>            Random random = new Random();
>            return random.nextDouble();
>        }
>    };
> 
>    public Matrix getW() {
>        return w;
>    }
> 
>    public Matrix getH() {
>        return h;
>    }
> 
> 
> }
>