You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by "Young Y. Kim" <yo...@gmail.com> on 2010/04/12 10:43:15 UTC

DistributedRowMatrix mult problem

I'm trying to test DistributedRowMatrix in eclipse for matrix calcuration in
hadoop.
A =
[[85,68,30,15,50,34],
[53,38,19,70,90,29],
[20,83,19,38,82,34],
[67,50,68,86,64,53],
[84,71,30,85,82,73],
[2,43,54,50,66,31]]

DistributedRowMatrix m = DistributedRowMatrix(path,...)
;
and check the values of m with iterating, it's fine.
m.transpose() result was same good.
but if m.transpose().mult(m) , multiplication result doesn't right.

it must be
>>> A.transpose()*A
matrix([[21983, 18854, 11121, 18747, 21968, 14852],
        [18854, 22347, 12191, 19319, 25486, 15402],
        [11121, 12191, 10062, 13600, 15144,  9685],
        [18747, 19319, 13600, 23690, 25940, 16145],
        [21968, 25486, 15144, 25940, 32500, 18522],
        [14852, 15402,  9685, 16145, 18522, 12252]])
(with python)
but Mahout result is

mTm =
0:16702.0    1:19207.0    2:15981.0    3:20949.0    4:24232.0
5:12485.0
0:16616.0    1:17762.0    2:15275.0    3:23223.0    4:24111.0
5:14771.0
0:8768.0    1:11699.0    2:9418.0    3:14882.0    4:16621.0    5:8957.0
0:14415.0    1:19297.0    2:18770.0    3:22575.0    4:25300.0
5:16552.0
0:20134.0    1:21402.0    2:21428.0    3:27676.0    4:30032.0
5:19056.0
0:11381.0    1:14729.0    2:12787.0    3:16913.0    4:18689.0    5:11580.0

What's the problem?

Thanks.

ps.
source code is very simple.
....
        DistributedRowMatrix m = new
DistributedRowMatrix("/tmp/testdata/6x6.mat", "/tmp/testdata/tmpOut", 6, 6);
        m.configure(new JobConf());

        System.out.println("original matrix = ");
        printMatrix(m); // matrix printing

        DistributedRowMatrix mT = m.transpose();
        System.out.println("mT = ");
        printMatrix(mT);

        DistributedRowMatrix mTm = mT.times(m);
        System.out.println("mTm = ");
        printMatrix(mTm);
...
or     printMatrix(m.transpose().mult(m));

Re: DistributedRowMatrix mult problem

Posted by Jake Mannix <ja...@gmail.com>.
Hi Young,

  The problem is one of documentation, and poor naming of the method:

DistributedRowMatrix.times(DistributedRowMatrix m)

should be called

DistributedRowMatrix.transposeTimes(DistributedRowMatrix m),

as it computes a.transpose().times(b), not a.times(b).

See the javadocs for the interface method:

http://lucene.apache.org/mahout/javadoc/mahout-math/org/apache/mahout/math/VectorIterable.html

The reason is that the most efficient distributed matrix multiplication for
sparse matrices is done
using a map-side join, which if the first matrix "A" is represented as a
distributed *ROW* matrix,
then what it is computing is A.transpose().times(B), in one map-reduce pass,
by pretending the
rows are actually columns (ie taking an O(1) virtual transpose operation).

To check to see you are able to get what you want, try leaving off your
transpose step in
your example step when using the distributed matrices, and see if you get
the same answer
(you should, because this is exactly what we do in the unit tests for this
class).

  -jake

On Mon, Apr 12, 2010 at 1:43 AM, Young Y. Kim <yo...@gmail.com> wrote:

> I'm trying to test DistributedRowMatrix in eclipse for matrix calcuration
> in
> hadoop.
> A =
> [[85,68,30,15,50,34],
> [53,38,19,70,90,29],
> [20,83,19,38,82,34],
> [67,50,68,86,64,53],
> [84,71,30,85,82,73],
> [2,43,54,50,66,31]]
>
> DistributedRowMatrix m = DistributedRowMatrix(path,...)
> ;
> and check the values of m with iterating, it's fine.
> m.transpose() result was same good.
> but if m.transpose().mult(m) , multiplication result doesn't right.
>
> it must be
> >>> A.transpose()*A
> matrix([[21983, 18854, 11121, 18747, 21968, 14852],
>        [18854, 22347, 12191, 19319, 25486, 15402],
>        [11121, 12191, 10062, 13600, 15144,  9685],
>        [18747, 19319, 13600, 23690, 25940, 16145],
>        [21968, 25486, 15144, 25940, 32500, 18522],
>        [14852, 15402,  9685, 16145, 18522, 12252]])
> (with python)
> but Mahout result is
>
> mTm =
> 0:16702.0    1:19207.0    2:15981.0    3:20949.0    4:24232.0
> 5:12485.0
> 0:16616.0    1:17762.0    2:15275.0    3:23223.0    4:24111.0
> 5:14771.0
> 0:8768.0    1:11699.0    2:9418.0    3:14882.0    4:16621.0    5:8957.0
> 0:14415.0    1:19297.0    2:18770.0    3:22575.0    4:25300.0
> 5:16552.0
> 0:20134.0    1:21402.0    2:21428.0    3:27676.0    4:30032.0
> 5:19056.0
> 0:11381.0    1:14729.0    2:12787.0    3:16913.0    4:18689.0    5:11580.0
>
> What's the problem?
>
> Thanks.
>
> ps.
> source code is very simple.
> ....
>        DistributedRowMatrix m = new
> DistributedRowMatrix("/tmp/testdata/6x6.mat", "/tmp/testdata/tmpOut", 6,
> 6);
>        m.configure(new JobConf());
>
>        System.out.println("original matrix = ");
>        printMatrix(m); // matrix printing
>
>        DistributedRowMatrix mT = m.transpose();
>        System.out.println("mT = ");
>        printMatrix(mT);
>
>        DistributedRowMatrix mTm = mT.times(m);
>        System.out.println("mTm = ");
>        printMatrix(mTm);
> ...
> or     printMatrix(m.transpose().mult(m));
>