You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by John Norstad <j-...@northwestern.edu> on 2009/12/08 17:44:43 UTC

A MapReduce Algorithm for Matrix Multiplication

As an exercise while learning MapReduce, I developed an algorithm for matrix multiplication and wrote it up on my web site. If you're interested, it's at:

   http://homepage.mac.com/j.norstad/matrix-multiply

I present the algorithm in English, as pseudo-code, and as Java source code for Hadoop.

-- 
John Norstad
Academic and Research Technologies
Northwestern University




Re: Fwd: A MapReduce Algorithm for Matrix Multiplication

Posted by chwaqas254 <wa...@gmail.com>.
Hi all,

The implementation given in the following link only works while working with
standalone mode and not while  in pseudodistributed mode or distributed
mode:
http://homepage.mac.com/j.norstad/matrix-multiply

To be percise for sparse matrices it works fine but with dense matrices
strategy 1 and 4 does not work. Does anyone have solution for this or at
lease suggest that what can be the problem and how to solve it.Thanks

Regards,
wl

--
View this message in context: http://lucene.472066.n3.nabble.com/Fwd-A-MapReduce-Algorithm-for-Matrix-Multiplication-tp639798p3585667.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: Fwd: A MapReduce Algorithm for Matrix Multiplication

Posted by "Edward J. Yoon" <ed...@apache.org>.
Hi mahout,

OK, I'm also somewhat skeptical about the M/R based algebraic
computations after testing them on the large cluster (100 ~ 1000
nodes). M/R baed M/L algorithms? IMHO, It's about the same. :)

Therefore, currently Hama team is working to implement the BSP package
on hadoop, and another computing model/framework based on its
mechanism.

The BSP package of Hama will be introduced to hadoop communities in this week.

Any questions are welcome.

On Thu, Dec 10, 2009 at 3:46 AM, Ted Dunning <te...@gmail.com> wrote:
> I haven't heard of any progress from HAMA and I am not optimistic about
> their proposed approach.
>
> On Wed, Dec 9, 2009 at 10:13 AM, Avram Aelony <aa...@mac.com> wrote:
>
>>
>> This is also a project called HAMA  <http://wiki.apache.org/hama/>.
>> Perhaps your efforts can help further (or coordinate with) HAMA?
>>
>>
>



-- 
Best Regards, Edward J. Yoon @ NHN, corp.
edwardyoon@apache.org
http://blog.udanax.org

Re: Fwd: A MapReduce Algorithm for Matrix Multiplication

Posted by "Edward J. Yoon" <ed...@udanax.org>.
Hi mahout,

I'm also somewhat skeptical about the M/R based algebraic
computations after testing them on the large cluster (100 ~ 1000
nodes). M/R baed M/L algorithms? IMHO, It's about the same. :)

Therefore, currently Hama team is working to implement the BSP package
on hadoop, and another computing model/framework based on its
mechanism.

The BSP package of Hama will be introduced to hadoop communities in this week.

Any questions are welcome.

On Thu, Dec 10, 2009 at 3:46 AM, Ted Dunning <te...@gmail.com> wrote:
> I haven't heard of any progress from HAMA and I am not optimistic about
> their proposed approach.
>
> On Wed, Dec 9, 2009 at 10:13 AM, Avram Aelony <aa...@mac.com> wrote:
>
>>
>> This is also a project called HAMA  <http://wiki.apache.org/hama/>.
>> Perhaps your efforts can help further (or coordinate with) HAMA?
>>
>>
>



-- 
Best Regards, Edward J. Yoon @ NHN, corp.
edwardyoon@apache.org
http://blog.udanax.org

Re: Fwd: A MapReduce Algorithm for Matrix Multiplication

Posted by Ted Dunning <te...@gmail.com>.
I haven't heard of any progress from HAMA and I am not optimistic about
their proposed approach.

On Wed, Dec 9, 2009 at 10:13 AM, Avram Aelony <aa...@mac.com> wrote:

>
> This is also a project called HAMA  <http://wiki.apache.org/hama/>.
> Perhaps your efforts can help further (or coordinate with) HAMA?
>
>

Re: Fwd: A MapReduce Algorithm for Matrix Multiplication

Posted by Avram Aelony <aa...@mac.com>.
This is also a project called HAMA  <http://wiki.apache.org/hama/>.  
Perhaps your efforts can help further (or coordinate with) HAMA?

Best regards,
Avram


 
On Wednesday, December 09, 2009, at 10:05AM, "Ted Dunning" <te...@gmail.com> wrote:
>See here:
>
>http://arxiv.org/abs/0909.4061v1
>
>On Tue, Dec 8, 2009 at 6:47 PM, Zhengguo 'Mike' SUN
><zh...@yahoo.com>wrote:
>
>> Hi Jake,
>>
>> I am implementing the classical multiplicative update rule of NMF. The
>> matrix to be factorized is really big and sparse. Are you suggesting that I
>> can use some specialised algorithms for sparse matrix instead of the
>> standard multiplication algorithm? But what algorithms are you referring to?
>> Could you please provide some pointers?
>>
>> Thanks
>>
>>
>>
>>
>> ________________________________
>> From: Jake Mannix <ja...@gmail.com>
>> To: mahout-user@lucene.apache.org
>> Sent: Tue, December 8, 2009 8:53:39 PM
>> Subject: Re: Fwd: A MapReduce Algorithm for Matrix Multiplication
>>
>> Hi Mike,
>>
>>   In the NMF work you're doing, the matrices you're multiplying are very
>> thin, right?  They're a handful of dense vectors on the left, an equal
>> number of dense vectors on the right, and you're only computing the values
>> in the outer product which coincide with the nonzero entries of the
>> (sparse)
>> input matrix?  In which case you're not really doing the full matrix
>> multiplication, and doing this a slightly specialized way should provide
>> some serious speedup.
>>
>>   Unless you're not dealing with the "really big, but sparse" case, and
>> minimizing error only over the nonzero inputs?
>>
>>   -jake
>>
>> On Tue, Dec 8, 2009 at 5:43 PM, Zhengguo 'Mike' SUN
>> <zh...@yahoo.com>wrote:
>>
>> > Hi Ted, John,
>> >
>> > I am also interested in matrix multiplication. Actually, I am
>> implementing
>> > Non-negative Matrix Factorization, which basically is iterations of
>> matrix
>> > multiplication. What I did is exactly as what Ted said: doing an outer
>> > product using one column from left matrix and one row from right matrix
>> and
>> > summing up all the results. Idealy, this could be done using only one
>> job:
>> > mappers doing multiplication and reducers doing summing. But the thing is
>> > that how do you match a column from left matrix and a row from right
>> matrix.
>> > Assuming that two input matrices are give in column major order and row
>> > major order, it seems to me that you have to implement a customized
>> > InputFormat to do the matching. Or maybe you could use one matrix as
>> input
>> > and read the other inside mappers, but this also has problem when you
>> have a
>> > lot of mappers. The straightforward way seems to be using two jobs. The
>> > mappers of the first job emit the id and the content of the row or column
>> it
>> >  reads and the reducers do the multiplication. The second job is to do
>> the
>> > summing. This is kind of similar to what John did. But it is inefficient
>> if
>> > you have many iterations. Any comments?
>> >
>> >
>> >
>> >
>> > ________________________________
>> > From: Ted Dunning <te...@gmail.com>
>> > To: j-norstad@northwestern.edu; mahout-user <
>> mahout-user@lucene.apache.org
>> > >
>> > Sent: Tue, December 8, 2009 2:59:06 PM
>> > Subject: Fwd: A MapReduce Algorithm for Matrix Multiplication
>> >
>> > John,
>> >
>> > Your comment about matrix multiplication was forwarded to the mahout-user
>> > emailing list.
>> >
>> > I have a question for you about your approach.  Have you considered doing
>> > the multiplication in a single step by storing the the first matrix in
>> > column major order and the second in row major order?  The idea would be
>> to
>> > do a block-wise outer product and use the combiner to sum elements ahead
>> of
>> > the reducer.  This could be done by reading a (block) column from the
>> left
>> > matrix and a (block) row from the second and emitting all of the (block)
>> > elements of the outer product of this column and row.  The key emitted
>> with
>> > these blocks would be the i,j index of the final location of each block
>> in
>> > the final product.  The combiner and reducer could add all of the blocks
>> > together and the use of the combiner would serve to substantially
>> minimize
>> > the amount of network traffic.
>> >
>> > This alternative is equivalent to inverting the loop nesting in a
>> > conventional matrix multiplication algorithm.  It is also very closely
>> > related to the way that cooccurrence counting is typically done in
>> Hadoop.
>> >
>> > Is your project an ongoing effort?
>> >
>> > ---------- Forwarded message ----------
>> > From: Robin Anil <ro...@gmail.com>
>> > Date: Tue, Dec 8, 2009 at 11:09 AM
>> > Subject: Fwd: A MapReduce Algorithm for Matrix Multiplication
>> > To: mahout-user@lucene.apache.org
>> > Cc: John Norstad <j-...@northwestern.edu>
>> >
>> >
>> > ---------- Forwarded message ----------
>> > From: John Norstad <j-...@northwestern.edu>
>> > Date: Tue, Dec 8, 2009 at 10:14 PM
>> > Subject: A MapReduce Algorithm for Matrix Multiplication
>> > To: MapReduce <ma...@hadoop.apache.org>
>> >
>> >
>> > As an exercise while learning MapReduce, I developed an algorithm for
>> > matrix multiplication and wrote it up on my web site. If you're
>> > interested, it's at:
>> >
>> >  http://homepage.mac.com/j.norstad/matrix-multiply
>> >
>> > I present the algorithm in English, as pseudo-code, and as Java source
>> > code for Hadoop.
>> >
>> > --
>> > John Norstad
>> > Academic and Research Technologies
>> > Northwestern University
>> >
>> >
>> >
>> > --
>> > Ted Dunning, CTO
>> > DeepDyve
>> >
>> >
>> >
>> >
>> >
>>
>>
>>
>>
>
>
>
>
>-- 
>Ted Dunning, CTO
>DeepDyve
>
>

Re: Fwd: A MapReduce Algorithm for Matrix Multiplication

Posted by Ted Dunning <te...@gmail.com>.
See here:

http://arxiv.org/abs/0909.4061v1

On Tue, Dec 8, 2009 at 6:47 PM, Zhengguo 'Mike' SUN
<zh...@yahoo.com>wrote:

> Hi Jake,
>
> I am implementing the classical multiplicative update rule of NMF. The
> matrix to be factorized is really big and sparse. Are you suggesting that I
> can use some specialised algorithms for sparse matrix instead of the
> standard multiplication algorithm? But what algorithms are you referring to?
> Could you please provide some pointers?
>
> Thanks
>
>
>
>
> ________________________________
> From: Jake Mannix <ja...@gmail.com>
> To: mahout-user@lucene.apache.org
> Sent: Tue, December 8, 2009 8:53:39 PM
> Subject: Re: Fwd: A MapReduce Algorithm for Matrix Multiplication
>
> Hi Mike,
>
>   In the NMF work you're doing, the matrices you're multiplying are very
> thin, right?  They're a handful of dense vectors on the left, an equal
> number of dense vectors on the right, and you're only computing the values
> in the outer product which coincide with the nonzero entries of the
> (sparse)
> input matrix?  In which case you're not really doing the full matrix
> multiplication, and doing this a slightly specialized way should provide
> some serious speedup.
>
>   Unless you're not dealing with the "really big, but sparse" case, and
> minimizing error only over the nonzero inputs?
>
>   -jake
>
> On Tue, Dec 8, 2009 at 5:43 PM, Zhengguo 'Mike' SUN
> <zh...@yahoo.com>wrote:
>
> > Hi Ted, John,
> >
> > I am also interested in matrix multiplication. Actually, I am
> implementing
> > Non-negative Matrix Factorization, which basically is iterations of
> matrix
> > multiplication. What I did is exactly as what Ted said: doing an outer
> > product using one column from left matrix and one row from right matrix
> and
> > summing up all the results. Idealy, this could be done using only one
> job:
> > mappers doing multiplication and reducers doing summing. But the thing is
> > that how do you match a column from left matrix and a row from right
> matrix.
> > Assuming that two input matrices are give in column major order and row
> > major order, it seems to me that you have to implement a customized
> > InputFormat to do the matching. Or maybe you could use one matrix as
> input
> > and read the other inside mappers, but this also has problem when you
> have a
> > lot of mappers. The straightforward way seems to be using two jobs. The
> > mappers of the first job emit the id and the content of the row or column
> it
> >  reads and the reducers do the multiplication. The second job is to do
> the
> > summing. This is kind of similar to what John did. But it is inefficient
> if
> > you have many iterations. Any comments?
> >
> >
> >
> >
> > ________________________________
> > From: Ted Dunning <te...@gmail.com>
> > To: j-norstad@northwestern.edu; mahout-user <
> mahout-user@lucene.apache.org
> > >
> > Sent: Tue, December 8, 2009 2:59:06 PM
> > Subject: Fwd: A MapReduce Algorithm for Matrix Multiplication
> >
> > John,
> >
> > Your comment about matrix multiplication was forwarded to the mahout-user
> > emailing list.
> >
> > I have a question for you about your approach.  Have you considered doing
> > the multiplication in a single step by storing the the first matrix in
> > column major order and the second in row major order?  The idea would be
> to
> > do a block-wise outer product and use the combiner to sum elements ahead
> of
> > the reducer.  This could be done by reading a (block) column from the
> left
> > matrix and a (block) row from the second and emitting all of the (block)
> > elements of the outer product of this column and row.  The key emitted
> with
> > these blocks would be the i,j index of the final location of each block
> in
> > the final product.  The combiner and reducer could add all of the blocks
> > together and the use of the combiner would serve to substantially
> minimize
> > the amount of network traffic.
> >
> > This alternative is equivalent to inverting the loop nesting in a
> > conventional matrix multiplication algorithm.  It is also very closely
> > related to the way that cooccurrence counting is typically done in
> Hadoop.
> >
> > Is your project an ongoing effort?
> >
> > ---------- Forwarded message ----------
> > From: Robin Anil <ro...@gmail.com>
> > Date: Tue, Dec 8, 2009 at 11:09 AM
> > Subject: Fwd: A MapReduce Algorithm for Matrix Multiplication
> > To: mahout-user@lucene.apache.org
> > Cc: John Norstad <j-...@northwestern.edu>
> >
> >
> > ---------- Forwarded message ----------
> > From: John Norstad <j-...@northwestern.edu>
> > Date: Tue, Dec 8, 2009 at 10:14 PM
> > Subject: A MapReduce Algorithm for Matrix Multiplication
> > To: MapReduce <ma...@hadoop.apache.org>
> >
> >
> > As an exercise while learning MapReduce, I developed an algorithm for
> > matrix multiplication and wrote it up on my web site. If you're
> > interested, it's at:
> >
> >  http://homepage.mac.com/j.norstad/matrix-multiply
> >
> > I present the algorithm in English, as pseudo-code, and as Java source
> > code for Hadoop.
> >
> > --
> > John Norstad
> > Academic and Research Technologies
> > Northwestern University
> >
> >
> >
> > --
> > Ted Dunning, CTO
> > DeepDyve
> >
> >
> >
> >
> >
>
>
>
>




-- 
Ted Dunning, CTO
DeepDyve

Re: Fwd: A MapReduce Algorithm for Matrix Multiplication

Posted by Jake Mannix <ja...@gmail.com>.
On Tue, Dec 8, 2009 at 7:27 PM, Zhengguo 'Mike' SUN
<zh...@yahoo.com>wrote:

>
> No, I am not computing entries in X-WH. I am doing iterations on the
> following two rules:
>
> H <- H*(WtX)/(WtWH),      W<- W*(XHt)/(WHHt)
>
>
Hmm... I haven't thought about this for much more than this today, but let's
see...

So if your X matrix is N by M, with each of the N rows having an average of
d non-empty
values, and you want a rank-k factorization, producing W'WH and WHH' is
going to take
something like O(k^2 * (N+M)) flops each, right?

And you need to iterate this again and again until convergence?  How many
iterations does this
tend to take?  It seems like this algorithm might not scale terribly well on
humongous data sets,
at least in comparison to what I know of the speed of e.g. SVD via Lanczos
or aGHA.

It also seems like this isn't going to respect HDFS locality very much,
because the full W and
H matrices are going to need to get moved around a lot, and continually
updated (ie replaced)
with newer values.   Maybe there are some tricks which can make this run
more efficiently,
but I would caution you to think carefully through how you parallelize this
- the most straightforward
approach seems like it is going to entail a *lot* of matrix multiplications.

  -jake

Re: Fwd: A MapReduce Algorithm for Matrix Multiplication

Posted by Zhengguo 'Mike' SUN <zh...@yahoo.com>.
From: Jake Mannix <ja...@gmail.com>
To: mahout-user@lucene.apache.org
Sent: Tue, December 8, 2009 10:05:50 PM
Subject: Re: Fwd: A MapReduce Algorithm for Matrix Multiplication

On Tue, Dec 8, 2009 at 6:47 PM, Zhengguo 'Mike' SUN
<zh...@yahoo.com>wrote:

> Hi Jake,
>
> I am implementing the classical multiplicative update rule of NMF. The
> matrix to be factorized is really big and sparse. Are you suggesting that I
> can use some specialised algorithms for sparse matrix instead of the
> standard multiplication algorithm? But what algorithms are you referring to?
> Could you please provide some pointers?
>

So given your input matrix X, you're trying to find non-negative matrices W
(thin matrix, with few long dense columns) and H (wide matrix, with few long
dense rows) which minimize || X - WH ||, right, where || * || is the
Froebenius norm, right?

Yes.

I'm just suggesting that you don't even compute entries in X - WH where X
has missing data - optimize treating those values as unknown, not "zero".

No, I am not computing entries in X-WH. I am doing iterations on the following two rules:

H <- H*(WtX)/(WtWH),      W<- W*(XHt)/(WHHt)

The detailed algorithm is presented in this paper:

http://www.nips.cc/Web/Groups/NIPS/NIPS2000/00papers-pub-on-web/LeeSeung.ps.gz

  -jake



      

Re: Fwd: A MapReduce Algorithm for Matrix Multiplication

Posted by Jake Mannix <ja...@gmail.com>.
On Tue, Dec 8, 2009 at 6:47 PM, Zhengguo 'Mike' SUN
<zh...@yahoo.com>wrote:

> Hi Jake,
>
> I am implementing the classical multiplicative update rule of NMF. The
> matrix to be factorized is really big and sparse. Are you suggesting that I
> can use some specialised algorithms for sparse matrix instead of the
> standard multiplication algorithm? But what algorithms are you referring to?
> Could you please provide some pointers?
>

So given your input matrix X, you're trying to find non-negative matrices W
(thin matrix, with few long dense columns) and H (wide matrix, with few long
dense rows) which minimize || X - WH ||, right, where || * || is the
Froebenius norm, right?

I'm just suggesting that you don't even compute entries in X - WH where X
has missing data - optimize treating those values as unknown, not "zero".

  -jake

Re: Fwd: A MapReduce Algorithm for Matrix Multiplication

Posted by Zhengguo 'Mike' SUN <zh...@yahoo.com>.
Hi Jake,

I am implementing the classical multiplicative update rule of NMF. The matrix to be factorized is really big and sparse. Are you suggesting that I can use some specialised algorithms for sparse matrix instead of the standard multiplication algorithm? But what algorithms are you referring to? Could you please provide some pointers?

Thanks




________________________________
From: Jake Mannix <ja...@gmail.com>
To: mahout-user@lucene.apache.org
Sent: Tue, December 8, 2009 8:53:39 PM
Subject: Re: Fwd: A MapReduce Algorithm for Matrix Multiplication

Hi Mike,

  In the NMF work you're doing, the matrices you're multiplying are very
thin, right?  They're a handful of dense vectors on the left, an equal
number of dense vectors on the right, and you're only computing the values
in the outer product which coincide with the nonzero entries of the (sparse)
input matrix?  In which case you're not really doing the full matrix
multiplication, and doing this a slightly specialized way should provide
some serious speedup.

  Unless you're not dealing with the "really big, but sparse" case, and
minimizing error only over the nonzero inputs?

  -jake

On Tue, Dec 8, 2009 at 5:43 PM, Zhengguo 'Mike' SUN
<zh...@yahoo.com>wrote:

> Hi Ted, John,
>
> I am also interested in matrix multiplication. Actually, I am implementing
> Non-negative Matrix Factorization, which basically is iterations of matrix
> multiplication. What I did is exactly as what Ted said: doing an outer
> product using one column from left matrix and one row from right matrix and
> summing up all the results. Idealy, this could be done using only one job:
> mappers doing multiplication and reducers doing summing. But the thing is
> that how do you match a column from left matrix and a row from right matrix.
> Assuming that two input matrices are give in column major order and row
> major order, it seems to me that you have to implement a customized
> InputFormat to do the matching. Or maybe you could use one matrix as input
> and read the other inside mappers, but this also has problem when you have a
> lot of mappers. The straightforward way seems to be using two jobs. The
> mappers of the first job emit the id and the content of the row or column it
>  reads and the reducers do the multiplication. The second job is to do the
> summing. This is kind of similar to what John did. But it is inefficient if
> you have many iterations. Any comments?
>
>
>
>
> ________________________________
> From: Ted Dunning <te...@gmail.com>
> To: j-norstad@northwestern.edu; mahout-user <mahout-user@lucene.apache.org
> >
> Sent: Tue, December 8, 2009 2:59:06 PM
> Subject: Fwd: A MapReduce Algorithm for Matrix Multiplication
>
> John,
>
> Your comment about matrix multiplication was forwarded to the mahout-user
> emailing list.
>
> I have a question for you about your approach.  Have you considered doing
> the multiplication in a single step by storing the the first matrix in
> column major order and the second in row major order?  The idea would be to
> do a block-wise outer product and use the combiner to sum elements ahead of
> the reducer.  This could be done by reading a (block) column from the left
> matrix and a (block) row from the second and emitting all of the (block)
> elements of the outer product of this column and row.  The key emitted with
> these blocks would be the i,j index of the final location of each block in
> the final product.  The combiner and reducer could add all of the blocks
> together and the use of the combiner would serve to substantially minimize
> the amount of network traffic.
>
> This alternative is equivalent to inverting the loop nesting in a
> conventional matrix multiplication algorithm.  It is also very closely
> related to the way that cooccurrence counting is typically done in Hadoop.
>
> Is your project an ongoing effort?
>
> ---------- Forwarded message ----------
> From: Robin Anil <ro...@gmail.com>
> Date: Tue, Dec 8, 2009 at 11:09 AM
> Subject: Fwd: A MapReduce Algorithm for Matrix Multiplication
> To: mahout-user@lucene.apache.org
> Cc: John Norstad <j-...@northwestern.edu>
>
>
> ---------- Forwarded message ----------
> From: John Norstad <j-...@northwestern.edu>
> Date: Tue, Dec 8, 2009 at 10:14 PM
> Subject: A MapReduce Algorithm for Matrix Multiplication
> To: MapReduce <ma...@hadoop.apache.org>
>
>
> As an exercise while learning MapReduce, I developed an algorithm for
> matrix multiplication and wrote it up on my web site. If you're
> interested, it's at:
>
>  http://homepage.mac.com/j.norstad/matrix-multiply
>
> I present the algorithm in English, as pseudo-code, and as Java source
> code for Hadoop.
>
> --
> John Norstad
> Academic and Research Technologies
> Northwestern University
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>
>
>
>
>



      

Re: Fwd: A MapReduce Algorithm for Matrix Multiplication

Posted by Jake Mannix <ja...@gmail.com>.
Hi Mike,

  In the NMF work you're doing, the matrices you're multiplying are very
thin, right?  They're a handful of dense vectors on the left, an equal
number of dense vectors on the right, and you're only computing the values
in the outer product which coincide with the nonzero entries of the (sparse)
input matrix?  In which case you're not really doing the full matrix
multiplication, and doing this a slightly specialized way should provide
some serious speedup.

  Unless you're not dealing with the "really big, but sparse" case, and
minimizing error only over the nonzero inputs?

  -jake

On Tue, Dec 8, 2009 at 5:43 PM, Zhengguo 'Mike' SUN
<zh...@yahoo.com>wrote:

> Hi Ted, John,
>
> I am also interested in matrix multiplication. Actually, I am implementing
> Non-negative Matrix Factorization, which basically is iterations of matrix
> multiplication. What I did is exactly as what Ted said: doing an outer
> product using one column from left matrix and one row from right matrix and
> summing up all the results. Idealy, this could be done using only one job:
> mappers doing multiplication and reducers doing summing. But the thing is
> that how do you match a column from left matrix and a row from right matrix.
> Assuming that two input matrices are give in column major order and row
> major order, it seems to me that you have to implement a customized
> InputFormat to do the matching. Or maybe you could use one matrix as input
> and read the other inside mappers, but this also has problem when you have a
> lot of mappers. The straightforward way seems to be using two jobs. The
> mappers of the first job emit the id and the content of the row or column it
>  reads and the reducers do the multiplication. The second job is to do the
> summing. This is kind of similar to what John did. But it is inefficient if
> you have many iterations. Any comments?
>
>
>
>
> ________________________________
> From: Ted Dunning <te...@gmail.com>
> To: j-norstad@northwestern.edu; mahout-user <mahout-user@lucene.apache.org
> >
> Sent: Tue, December 8, 2009 2:59:06 PM
> Subject: Fwd: A MapReduce Algorithm for Matrix Multiplication
>
> John,
>
> Your comment about matrix multiplication was forwarded to the mahout-user
> emailing list.
>
> I have a question for you about your approach.  Have you considered doing
> the multiplication in a single step by storing the the first matrix in
> column major order and the second in row major order?  The idea would be to
> do a block-wise outer product and use the combiner to sum elements ahead of
> the reducer.  This could be done by reading a (block) column from the left
> matrix and a (block) row from the second and emitting all of the (block)
> elements of the outer product of this column and row.  The key emitted with
> these blocks would be the i,j index of the final location of each block in
> the final product.  The combiner and reducer could add all of the blocks
> together and the use of the combiner would serve to substantially minimize
> the amount of network traffic.
>
> This alternative is equivalent to inverting the loop nesting in a
> conventional matrix multiplication algorithm.  It is also very closely
> related to the way that cooccurrence counting is typically done in Hadoop.
>
> Is your project an ongoing effort?
>
> ---------- Forwarded message ----------
> From: Robin Anil <ro...@gmail.com>
> Date: Tue, Dec 8, 2009 at 11:09 AM
> Subject: Fwd: A MapReduce Algorithm for Matrix Multiplication
> To: mahout-user@lucene.apache.org
> Cc: John Norstad <j-...@northwestern.edu>
>
>
> ---------- Forwarded message ----------
> From: John Norstad <j-...@northwestern.edu>
> Date: Tue, Dec 8, 2009 at 10:14 PM
> Subject: A MapReduce Algorithm for Matrix Multiplication
> To: MapReduce <ma...@hadoop.apache.org>
>
>
> As an exercise while learning MapReduce, I developed an algorithm for
> matrix multiplication and wrote it up on my web site. If you're
> interested, it's at:
>
>   http://homepage.mac.com/j.norstad/matrix-multiply
>
> I present the algorithm in English, as pseudo-code, and as Java source
> code for Hadoop.
>
> --
> John Norstad
> Academic and Research Technologies
> Northwestern University
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>
>
>
>
>

Re: Fwd: A MapReduce Algorithm for Matrix Multiplication

Posted by Zhengguo 'Mike' SUN <zh...@yahoo.com>.
Thanks for your pointers, Jake. 

I am still working on the implementation. Certainly I would like to contribute after I finish my implementation.



________________________________
From: Jake Mannix <ja...@gmail.com>
To: mahout-user@lucene.apache.org
Cc: j-norstad@northwestern.edu
Sent: Tue, December 8, 2009 10:16:31 PM
Subject: Re: Fwd: A MapReduce Algorithm for Matrix Multiplication

On Tue, Dec 8, 2009 at 6:49 PM, Zhengguo 'Mike' SUN
<zh...@yahoo.com>wrote:

> I appreciate if you could provide some pointers.
>

Of course, we'd also appreciate it if you could contribute that code here to
Mahout, as well, if
possible.  It'd be great to have that here for everyone to use, as I'm sure
that having the choice of
doing NMF as well as SVD or LSI or pLSI would very helpful for the rest of
the community!

Patches welcome!  Even if your first impls don't do any of the tricks we're
describing, if you
contribute it, we can help you improve it.

  -jake


>
>
> ________________________________
> From: Jake Mannix <ja...@gmail.com>
> To: mahout-user@lucene.apache.org
> Cc: j-norstad@northwestern.edu
> Sent: Tue, December 8, 2009 9:08:19 PM
> Subject: Re: Fwd: A MapReduce Algorithm for Matrix Multiplication
>
> On Tue, Dec 8, 2009 at 5:56 PM, Ted Dunning <te...@gmail.com> wrote:
>
> > NMF should be amenable to the stochastic decomposition trick.  That
> reduces
> > the problem to a much smaller factorization that you could plausibly do
> > using sequential techniques.  Jake Mannix is working on getting
> > factorizations going, but I don't know if he has gotten to NMF.
> >
>
> I'm not currently working on NMF, but the stochastic decomposition trick
> will be
> in there soon, which should allow all this pretty easily.
>
> Although... if you start with a positive matrix, you may want a specialized
> random
> projector which preserves positivity for this kind of thing.  But I'm not
> sure, I
> haven't looked too closely at what happens when you try to do this trick on
> NMF.
>
>  -jake
>
>
>
>
>



      

Re: Fwd: A MapReduce Algorithm for Matrix Multiplication

Posted by Jake Mannix <ja...@gmail.com>.
On Tue, Dec 8, 2009 at 6:49 PM, Zhengguo 'Mike' SUN
<zh...@yahoo.com>wrote:

> I appreciate if you could provide some pointers.
>

Of course, we'd also appreciate it if you could contribute that code here to
Mahout, as well, if
possible.  It'd be great to have that here for everyone to use, as I'm sure
that having the choice of
doing NMF as well as SVD or LSI or pLSI would very helpful for the rest of
the community!

Patches welcome!  Even if your first impls don't do any of the tricks we're
describing, if you
contribute it, we can help you improve it.

  -jake


>
>
> ________________________________
> From: Jake Mannix <ja...@gmail.com>
> To: mahout-user@lucene.apache.org
> Cc: j-norstad@northwestern.edu
> Sent: Tue, December 8, 2009 9:08:19 PM
> Subject: Re: Fwd: A MapReduce Algorithm for Matrix Multiplication
>
> On Tue, Dec 8, 2009 at 5:56 PM, Ted Dunning <te...@gmail.com> wrote:
>
> > NMF should be amenable to the stochastic decomposition trick.  That
> reduces
> > the problem to a much smaller factorization that you could plausibly do
> > using sequential techniques.  Jake Mannix is working on getting
> > factorizations going, but I don't know if he has gotten to NMF.
> >
>
> I'm not currently working on NMF, but the stochastic decomposition trick
> will be
> in there soon, which should allow all this pretty easily.
>
> Although... if you start with a positive matrix, you may want a specialized
> random
> projector which preserves positivity for this kind of thing.  But I'm not
> sure, I
> haven't looked too closely at what happens when you try to do this trick on
> NMF.
>
>   -jake
>
>
>
>
>

Re: Fwd: A MapReduce Algorithm for Matrix Multiplication

Posted by Jake Mannix <ja...@gmail.com>.
On Tue, Dec 8, 2009 at 6:49 PM, Zhengguo 'Mike' SUN
<zh...@yahoo.com>wrote:

> What stochastic decomposition trick are you guys referring to? I appreciate
> if you could provide some pointers.
>

http://arxiv.org/abs/0909.4061

Basically the trick is: first project your column space (or row space, but
let's think of the rows as your
data points) randomly down from gigantically huge down to some small, but
still larger than the final
dimension you want to reduce to (say: project from millions of columns down
to thousands, where you
want to end up with a rank-300 factorization).  Then take your num_rows by
small_num matrix and square
it, leaving you with a small_num by small_num matrix you can decompose in
memory using whatever
techniques you know to work on small, but dense, matrices.

It works because of the Johnson-Lindenstrauss
lemma<http://en.wikipedia.org/wiki/Johnson%E2%80%93Lindenstrauss_lemma>.


  -jake


>
>
> ________________________________
> From: Jake Mannix <ja...@gmail.com>
> To: mahout-user@lucene.apache.org
> Cc: j-norstad@northwestern.edu
> Sent: Tue, December 8, 2009 9:08:19 PM
> Subject: Re: Fwd: A MapReduce Algorithm for Matrix Multiplication
>
> On Tue, Dec 8, 2009 at 5:56 PM, Ted Dunning <te...@gmail.com> wrote:
>
> > NMF should be amenable to the stochastic decomposition trick.  That
> reduces
> > the problem to a much smaller factorization that you could plausibly do
> > using sequential techniques.  Jake Mannix is working on getting
> > factorizations going, but I don't know if he has gotten to NMF.
> >
>
> I'm not currently working on NMF, but the stochastic decomposition trick
> will be
> in there soon, which should allow all this pretty easily.
>
> Although... if you start with a positive matrix, you may want a specialized
> random
> projector which preserves positivity for this kind of thing.  But I'm not
> sure, I
> haven't looked too closely at what happens when you try to do this trick on
> NMF.
>
>   -jake
>
>
>
>
>

Re: Fwd: A MapReduce Algorithm for Matrix Multiplication

Posted by Zhengguo 'Mike' SUN <zh...@yahoo.com>.
What stochastic decomposition trick are you guys referring to? I appreciate if you could provide some pointers.


________________________________
From: Jake Mannix <ja...@gmail.com>
To: mahout-user@lucene.apache.org
Cc: j-norstad@northwestern.edu
Sent: Tue, December 8, 2009 9:08:19 PM
Subject: Re: Fwd: A MapReduce Algorithm for Matrix Multiplication

On Tue, Dec 8, 2009 at 5:56 PM, Ted Dunning <te...@gmail.com> wrote:

> NMF should be amenable to the stochastic decomposition trick.  That reduces
> the problem to a much smaller factorization that you could plausibly do
> using sequential techniques.  Jake Mannix is working on getting
> factorizations going, but I don't know if he has gotten to NMF.
>

I'm not currently working on NMF, but the stochastic decomposition trick
will be
in there soon, which should allow all this pretty easily.

Although... if you start with a positive matrix, you may want a specialized
random
projector which preserves positivity for this kind of thing.  But I'm not
sure, I
haven't looked too closely at what happens when you try to do this trick on
NMF.

  -jake



      

Re: Fwd: A MapReduce Algorithm for Matrix Multiplication

Posted by Jake Mannix <ja...@gmail.com>.
On Tue, Dec 8, 2009 at 5:56 PM, Ted Dunning <te...@gmail.com> wrote:

> NMF should be amenable to the stochastic decomposition trick.  That reduces
> the problem to a much smaller factorization that you could plausibly do
> using sequential techniques.  Jake Mannix is working on getting
> factorizations going, but I don't know if he has gotten to NMF.
>

I'm not currently working on NMF, but the stochastic decomposition trick
will be
in there soon, which should allow all this pretty easily.

Although... if you start with a positive matrix, you may want a specialized
random
projector which preserves positivity for this kind of thing.  But I'm not
sure, I
haven't looked too closely at what happens when you try to do this trick on
NMF.

  -jake

Re: Fwd: A MapReduce Algorithm for Matrix Multiplication

Posted by Ted Dunning <te...@gmail.com>.
NMF should be amenable to the stochastic decomposition trick.  That reduces
the problem to a much smaller factorization that you could plausibly do
using sequential techniques.  Jake Manning is working on getting
factorizations going, but I don't know if he has gotten to NMF.

Jake's comment about doing the multiplication cleanly using sparse
techniques is important, but you should also remember that if you have a
sparse matrix stored in triple form, doing a transpose is simply a matter of
changing which index your consider a row and which a column.  For some
algorithms, it might pay to sort the result once so that you can do a
sort-join for the multiplication, but that isn't hard.


On Tue, Dec 8, 2009 at 5:43 PM, Zhengguo 'Mike' SUN
<zh...@yahoo.com>wrote:

> Hi Ted, John,
>
> I am also interested in matrix multiplication. Actually, I am implementing
> Non-negative Matrix Factorization, which basically is iterations of matrix
> multiplication. What I did is exactly as what Ted said: doing an outer
> product using one column from left matrix and one row from right matrix and
> summing up all the results. Idealy, this could be done using only one job:
> mappers doing multiplication and reducers doing summing. But the thing is
> that how do you match a column from left matrix and a row from right matrix.
> Assuming that two input matrices are give in column major order and row
> major order, it seems to me that you have to implement a customized
> InputFormat to do the matching. Or maybe you could use one matrix as input
> and read the other inside mappers, but this also has problem when you have a
> lot of mappers. The straightforward way seems to be using two jobs. The
> mappers of the first job emit the id and the content of the row or column it
> reads and the reducers do the multiplication. The second job is to do the
> summing. This is kind of similar to what John did. But it is inefficient if
> you have many iterations. Any comments?
>
>
>

Re: Fwd: A MapReduce Algorithm for Matrix Multiplication

Posted by Zhengguo 'Mike' SUN <zh...@yahoo.com>.
Hi Ted, John,

I am also interested in matrix multiplication. Actually, I am implementing Non-negative Matrix Factorization, which basically is iterations of matrix multiplication. What I did is exactly as what Ted said: doing an outer product using one column from left matrix and one row from right matrix and summing up all the results. Idealy, this could be done using only one job: mappers doing multiplication and reducers doing summing. But the thing is that how do you match a column from left matrix and a row from right matrix. Assuming that two input matrices are give in column major order and row major order, it seems to me that you have to implement a customized InputFormat to do the matching. Or maybe you could use one matrix as input and read the other inside mappers, but this also has problem when you have a lot of mappers. The straightforward way seems to be using two jobs. The mappers of the first job emit the id and the content of the row or column it
 reads and the reducers do the multiplication. The second job is to do the summing. This is kind of similar to what John did. But it is inefficient if you have many iterations. Any comments?




________________________________
From: Ted Dunning <te...@gmail.com>
To: j-norstad@northwestern.edu; mahout-user <ma...@lucene.apache.org>
Sent: Tue, December 8, 2009 2:59:06 PM
Subject: Fwd: A MapReduce Algorithm for Matrix Multiplication

John,

Your comment about matrix multiplication was forwarded to the mahout-user
emailing list.

I have a question for you about your approach.  Have you considered doing
the multiplication in a single step by storing the the first matrix in
column major order and the second in row major order?  The idea would be to
do a block-wise outer product and use the combiner to sum elements ahead of
the reducer.  This could be done by reading a (block) column from the left
matrix and a (block) row from the second and emitting all of the (block)
elements of the outer product of this column and row.  The key emitted with
these blocks would be the i,j index of the final location of each block in
the final product.  The combiner and reducer could add all of the blocks
together and the use of the combiner would serve to substantially minimize
the amount of network traffic.

This alternative is equivalent to inverting the loop nesting in a
conventional matrix multiplication algorithm.  It is also very closely
related to the way that cooccurrence counting is typically done in Hadoop.

Is your project an ongoing effort?

---------- Forwarded message ----------
From: Robin Anil <ro...@gmail.com>
Date: Tue, Dec 8, 2009 at 11:09 AM
Subject: Fwd: A MapReduce Algorithm for Matrix Multiplication
To: mahout-user@lucene.apache.org
Cc: John Norstad <j-...@northwestern.edu>


---------- Forwarded message ----------
From: John Norstad <j-...@northwestern.edu>
Date: Tue, Dec 8, 2009 at 10:14 PM
Subject: A MapReduce Algorithm for Matrix Multiplication
To: MapReduce <ma...@hadoop.apache.org>


As an exercise while learning MapReduce, I developed an algorithm for
matrix multiplication and wrote it up on my web site. If you're
interested, it's at:

  http://homepage.mac.com/j.norstad/matrix-multiply

I present the algorithm in English, as pseudo-code, and as Java source
code for Hadoop.

--
John Norstad
Academic and Research Technologies
Northwestern University



-- 
Ted Dunning, CTO
DeepDyve



      

Fwd: A MapReduce Algorithm for Matrix Multiplication

Posted by Ted Dunning <te...@gmail.com>.
John,

Your comment about matrix multiplication was forwarded to the mahout-user
emailing list.

I have a question for you about your approach.  Have you considered doing
the multiplication in a single step by storing the the first matrix in
column major order and the second in row major order?  The idea would be to
do a block-wise outer product and use the combiner to sum elements ahead of
the reducer.  This could be done by reading a (block) column from the left
matrix and a (block) row from the second and emitting all of the (block)
elements of the outer product of this column and row.  The key emitted with
these blocks would be the i,j index of the final location of each block in
the final product.  The combiner and reducer could add all of the blocks
together and the use of the combiner would serve to substantially minimize
the amount of network traffic.

This alternative is equivalent to inverting the loop nesting in a
conventional matrix multiplication algorithm.  It is also very closely
related to the way that cooccurrence counting is typically done in Hadoop.

Is your project an ongoing effort?

---------- Forwarded message ----------
From: Robin Anil <ro...@gmail.com>
Date: Tue, Dec 8, 2009 at 11:09 AM
Subject: Fwd: A MapReduce Algorithm for Matrix Multiplication
To: mahout-user@lucene.apache.org
Cc: John Norstad <j-...@northwestern.edu>


---------- Forwarded message ----------
From: John Norstad <j-...@northwestern.edu>
Date: Tue, Dec 8, 2009 at 10:14 PM
Subject: A MapReduce Algorithm for Matrix Multiplication
To: MapReduce <ma...@hadoop.apache.org>


As an exercise while learning MapReduce, I developed an algorithm for
matrix multiplication and wrote it up on my web site. If you're
interested, it's at:

  http://homepage.mac.com/j.norstad/matrix-multiply

I present the algorithm in English, as pseudo-code, and as Java source
code for Hadoop.

--
John Norstad
Academic and Research Technologies
Northwestern University



-- 
Ted Dunning, CTO
DeepDyve

Fwd: A MapReduce Algorithm for Matrix Multiplication

Posted by Robin Anil <ro...@gmail.com>.
---------- Forwarded message ----------
From: John Norstad <j-...@northwestern.edu>
Date: Tue, Dec 8, 2009 at 10:14 PM
Subject: A MapReduce Algorithm for Matrix Multiplication
To: MapReduce <ma...@hadoop.apache.org>


As an exercise while learning MapReduce, I developed an algorithm for
matrix multiplication and wrote it up on my web site. If you're
interested, it's at:

  http://homepage.mac.com/j.norstad/matrix-multiply

I present the algorithm in English, as pseudo-code, and as Java source
code for Hadoop.

--
John Norstad
Academic and Research Technologies
Northwestern University