You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by "ahmed.nagy" <ah...@hotmail.com> on 2011/01/10 13:16:19 UTC

Distributed Matrix Multiplication and operations

I am implementing a matrix factorisation technique for matrices that does not
fit in memory of a node. I have checked the documentation and the book
Mahout in Action for the distributed matrix operations DistributedRowMatrixI
need to carry out some distributed matrix operations. I have designed the
algorithm in that way.
Three matrices A B and C
Divide the matrix A into chunks
Divide C into chunks 
Map chunks of A, C and the matrix B 
Compute the updates 
Reduce Matrix C then compute Matrix B 
Repeat the above set of operations for Maxiterations
1-do I need to distribute the matrices on the cluster if I am carrying out
operations 
2-How can I control the amount of parallelism by the mappers for example.
3-When I used the constructor of the DistributedRowMatrix
DistributedRowMatrix m = new
DistributedRowMatrix("path/to/vector/sequenceFile", "tmp/path", 10000000,
250000);
from the example found on 

https://hudson.apache.org/hudson/job/Mahout-Quality/javadoc/org/apache/mahout/math/hadoop/DistributedRowMatrix.html#getOutputTempPath()

it gives The constructor DistributedRowMatrix(String, String, int, int) is
undefined 
I dug a bit and i found that the first two parameters are string and string
however i found that they should recieve a type Path that I tried to define
intialise like  that Path in=new Path("path/to/vector/sequenceFile");//
"path/to/vector/sequenceFile"
		Path out=new Path("/tmp/path");
then I passed in and out as parameters 
DistributedRowMatrix m = new DistributedRowMatrix(in,out, 10000000, 250000);
4-Another point is the  m.configure(new JobConf()); produces a warning of
deperciated JobConf.
5-Is  there anyside effect from using the deperciated JobConf.
6-Would anybody pinpoint me to how to package this job and run it on a
cluster
7-However I am not sure how to pass the sequence file when it is residing on
the HDFS.
Sorry if some of the questions might look naive.
I apperciate any insights.
Regards
Ahmed Nagy


-----
Ahmed Nagy
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Distributed-Matrix-Multiplication-and-operations-tp2226668p2226668.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: Distributed Matrix Multiplication and operations

Posted by Ted Dunning <te...@gmail.com>.

Can you say more about the factorization that you are implementing?

In particular, would it benefit from the random projection work that Dimitry
Liubimov has been doing?

https://issues.apache.org/jira/browse/MAHOUT-376

On Mon, Jan 10, 2011 at 4:16 AM, ahmed.nagy <ah...@hotmail.com>wrote:

> I am implementing a matrix factorisation technique for matrices that does
> not
> fit in memory of a node.
>