You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Song Liu <so...@gmail.com> on 2011/01/20 20:48:08 UTC

Multiplicative Model of Upscaling Several Machine Learning Algorithms on MapReduce/Hadoop

Dear all,

First I apologize if this email goes to the wrong place, and I'll be appreciated if any one corrects me.

I would like to introduce a new project "BigO" to you all. It is a set of tools that uses multiplicative methods to upscale several machine learning algorithms on MapReduce, and it is inspired from my master's thesis "Upscaling Several Key Machine Learning Algorithms".

around March last year, I was noticed a paper published by Microsoft Research

http://research.microsoft.com/apps/pubs/default.aspx?id=119077

and found the generalized approach introduced by this paper is really fascinating and is very adaptive for some other machine learning algorithms, so in my master's thesis, I designed the two generic multiplicative models on MapReduce, and three algorithms (NMF, SVM and PageRank) are implemented under such model. Maybe this cannot be counted as a "ground breaking" discovery since I've read lot of papers talking about that, and I do not make any claim that our implementation is faster than any other specialized implementations. But I believe it may worth trying to summarize several machine learning implementation under a same model, and provide generic solutions.

Our Methods are specialized to solve the following two multiplicative problems:
1. Similarity Measure: Similarity measure is common when given a training set containing feature vectors. This model is characterized by its high-density of output matrix and large intermediate output.

2. Iterative Multiplication: Several Matrix Multiplications are needed to be calculated iteratively for some solutions (e.g. Optimization Problems), thus a light-weighted matrix multiplication implementation may be designed.

By using these two models, we show that several learning algorithms can be adapted efficiently. We are making a submission for a conference paper.

For anyone who may be interested, I invite you to visit our website:
http://code.google.com/p/bigo2/

A latest release is available on it and several toy examples can be tasted (Please see the instructions in release package). I'm going to make a new release contains more algorithms in Feb and a detailed summary of implementation should be uploaded soon.

I'm currently looking for supports/ideas/opinions for this project, and anyone interested are welcomed to send me emails via song.liu.awesome@gmail.com

Best Wishes
Song Liu