You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hama.apache.org by "Edward J. Yoon" <ed...@apache.org> on 2009/01/05 02:30:27 UTC

Large matrices multiplication problem.

Hama Trunk doesn't work for large matrices multiplication with
mapred.task.timeout and scanner.timeout exception. I tried 1,000,000 *
1,000,000 matrix multiplication on 100 node. (Rests are good)

To reduce read operation of duplicated block, I thought as describe
below. But, each map processing seems too large.

----
// c[i][k] += a[i][j] * b[j][k];

map() {
  SubMatrix a = value.get();

  for (RowResult row : scan) {
     collect : c[i][k] = a * b[j][k];
  }
}

reduce() {
  c[i][k] += c[i][k];
}
----

Should we increase {mapred.task.timeout and scanner.timeout}?
or any good idea?

-- 
Best Regards, Edward J. Yoon @ NHN, corp.
edwardyoon@apache.org
http://blog.udanax.org

Re: Large matrices multiplication problem.

Posted by "Edward J. Yoon" <ed...@apache.org>.
I tried to multiply 8,000 by 8,000 on 3 node using HAMA-129 patch.

There are three jobs.

1. collect 40 * 40 blocks of matrixA to collectionTable
2. collect 40 * 40 blocks of matrixB to collectionTable

Then, collectionTable will have an 64,000 (40^3) rows as describe below.

rowKey     block:a    block:b
c(0, 0)-0   a(0, 0)      b(0, 0)
c(0, 0)-1   a(0, 1)      b(1, 0)
...
c(7999, 7999)-63999  a(7999, 7999) * b(7999, 7999)

3. In the map(), multiply sub-matrices. In the reduce(), compute sum of c(i, j).

So, there is no network requests. And, We are obtaining a locality
benefit during run the no. 3 job.

----
Test result:

[d8g053:/root/hama-trunk]# bin/hama examples mult -m 10 -r 10 eightA eightB 1600
09/01/13 13:40:12 INFO hama.AbstractMatrix: Initializing the matrix storage.
09/01/13 13:40:17 INFO hama.AbstractMatrix: Create Matrix DenseMatrix_randujbhy
09/01/13 13:40:17 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the
same.
09/01/13 13:40:17 WARN mapred.JobClient: Use genericOptions for the
option -libjars
09/01/13 13:40:17 WARN mapred.JobClient: No job jar file set.  User
classes may not be found. See JobConf(Class) or
JobConf#setJar(String).
09/01/13 13:40:17 INFO mapred.TableInputFormatBase: split:
0->d8g055.nhncorp.com:,000000000000850
09/01/13 13:40:17 INFO mapred.TableInputFormatBase: split:
1->d8g054.nhncorp.com:000000000000850,000000000001802
09/01/13 13:40:17 INFO mapred.TableInputFormatBase: split:
2->d8g054.nhncorp.com:000000000001802,000000000002505
09/01/13 13:40:17 INFO mapred.TableInputFormatBase: split:
3->d8g055.nhncorp.com:000000000002505,000000000003213
09/01/13 13:40:17 INFO mapred.TableInputFormatBase: split:
4->d8g053.nhncorp.com:000000000003213,000000000003952
09/01/13 13:40:17 INFO mapred.TableInputFormatBase: split:
5->d8g055.nhncorp.com:000000000003952,000000000004775
09/01/13 13:40:17 INFO mapred.TableInputFormatBase: split:
6->d8g053.nhncorp.com:000000000004775,000000000005602
09/01/13 13:40:17 INFO mapred.TableInputFormatBase: split:
7->d8g053.nhncorp.com:000000000005602,000000000006644
09/01/13 13:40:17 INFO mapred.TableInputFormatBase: split:
8->d8g054.nhncorp.com:000000000006644,000000000007321
09/01/13 13:40:17 INFO mapred.TableInputFormatBase: split:
9->d8g054.nhncorp.com:000000000007321,
09/01/13 13:40:17 INFO mapred.JobClient: Running job: job_200901131229_0008
09/01/13 13:40:18 INFO mapred.JobClient:  map 0% reduce 0%
09/01/13 13:42:38 INFO mapred.JobClient:  map 10% reduce 0%
09/01/13 13:42:45 INFO mapred.JobClient:  map 20% reduce 0%
09/01/13 13:42:47 INFO mapred.JobClient:  map 20% reduce 1%
09/01/13 13:42:49 INFO mapred.JobClient:  map 30% reduce 2%
09/01/13 13:42:59 INFO mapred.JobClient:  map 30% reduce 3%
09/01/13 13:43:00 INFO mapred.JobClient:  map 30% reduce 4%
09/01/13 13:43:03 INFO mapred.JobClient:  map 30% reduce 5%
09/01/13 13:43:05 INFO mapred.JobClient:  map 30% reduce 6%
09/01/13 13:43:08 INFO mapred.JobClient:  map 40% reduce 6%
09/01/13 13:43:13 INFO mapred.JobClient:  map 50% reduce 6%
09/01/13 13:43:16 INFO mapred.JobClient:  map 60% reduce 6%
09/01/13 13:43:21 INFO mapred.JobClient:  map 60% reduce 8%
09/01/13 13:43:23 INFO mapred.JobClient:  map 60% reduce 10%
09/01/13 13:43:28 INFO mapred.JobClient:  map 60% reduce 11%
09/01/13 13:43:33 INFO mapred.JobClient:  map 60% reduce 12%
09/01/13 13:44:59 INFO mapred.JobClient:  map 70% reduce 12%
09/01/13 13:45:07 INFO mapred.JobClient:  map 80% reduce 12%
09/01/13 13:45:12 INFO mapred.JobClient:  map 80% reduce 13%
09/01/13 13:45:14 INFO mapred.JobClient:  map 80% reduce 14%
09/01/13 13:45:19 INFO mapred.JobClient:  map 80% reduce 15%
09/01/13 13:45:28 INFO mapred.JobClient:  map 90% reduce 15%
09/01/13 13:45:37 INFO mapred.JobClient:  map 90% reduce 16%
09/01/13 13:45:42 INFO mapred.JobClient:  map 90% reduce 17%
09/01/13 13:45:43 INFO mapred.JobClient:  map 100% reduce 17%
09/01/13 13:45:49 INFO mapred.JobClient:  map 100% reduce 21%
09/01/13 13:45:53 INFO mapred.JobClient:  map 100% reduce 25%
09/01/13 13:45:54 INFO mapred.JobClient:  map 100% reduce 29%
09/01/13 13:45:55 INFO mapred.JobClient:  map 100% reduce 32%
09/01/13 13:45:58 INFO mapred.JobClient:  map 100% reduce 36%
09/01/13 13:45:59 INFO mapred.JobClient:  map 100% reduce 40%
09/01/13 13:46:35 INFO mapred.JobClient:  map 100% reduce 41%
09/01/13 13:47:28 INFO mapred.JobClient:  map 100% reduce 42%
09/01/13 13:48:32 INFO mapred.JobClient:  map 100% reduce 43%
09/01/13 13:49:44 INFO mapred.JobClient:  map 100% reduce 44%
09/01/13 13:50:35 INFO mapred.JobClient:  map 100% reduce 45%
09/01/13 13:51:25 INFO mapred.JobClient:  map 100% reduce 46%
09/01/13 13:52:29 INFO mapred.JobClient:  map 100% reduce 47%
09/01/13 13:53:39 INFO mapred.JobClient:  map 100% reduce 48%
09/01/13 13:54:35 INFO mapred.JobClient:  map 100% reduce 49%
09/01/13 13:55:50 INFO mapred.JobClient:  map 100% reduce 50%
09/01/13 13:56:33 INFO mapred.JobClient:  map 100% reduce 51%
09/01/13 13:57:26 INFO mapred.JobClient:  map 100% reduce 52%
09/01/13 13:58:26 INFO mapred.JobClient:  map 100% reduce 53%
09/01/13 13:59:19 INFO mapred.JobClient:  map 100% reduce 54%
09/01/13 14:00:08 INFO mapred.JobClient:  map 100% reduce 55%
09/01/13 14:00:13 INFO mapred.JobClient:  map 100% reduce 57%
09/01/13 14:00:18 INFO mapred.JobClient:  map 100% reduce 58%
09/01/13 14:00:28 INFO mapred.JobClient:  map 100% reduce 61%
09/01/13 14:00:35 INFO mapred.JobClient:  map 100% reduce 62%
09/01/13 14:01:19 INFO mapred.JobClient:  map 100% reduce 63%
09/01/13 14:02:03 INFO mapred.JobClient:  map 100% reduce 64%
09/01/13 14:03:02 INFO mapred.JobClient:  map 100% reduce 65%
09/01/13 14:03:42 INFO mapred.JobClient:  map 100% reduce 66%
09/01/13 14:04:53 INFO mapred.JobClient:  map 100% reduce 67%
09/01/13 14:05:31 INFO mapred.JobClient:  map 100% reduce 69%
09/01/13 14:05:36 INFO mapred.JobClient:  map 100% reduce 70%
09/01/13 14:05:41 INFO mapred.JobClient:  map 100% reduce 74%
09/01/13 14:05:59 INFO mapred.JobClient:  map 100% reduce 75%
09/01/13 14:06:38 INFO mapred.JobClient:  map 100% reduce 78%
09/01/13 14:06:42 INFO mapred.JobClient:  map 100% reduce 79%
09/01/13 14:06:43 INFO mapred.JobClient:  map 100% reduce 82%
09/01/13 14:06:47 INFO mapred.JobClient:  map 100% reduce 83%
09/01/13 14:06:52 INFO mapred.JobClient:  map 100% reduce 85%
09/01/13 14:06:58 INFO mapred.JobClient:  map 100% reduce 86%
09/01/13 14:07:02 INFO mapred.JobClient:  map 100% reduce 89%
09/01/13 14:07:21 INFO mapred.JobClient:  map 100% reduce 90%
09/01/13 14:08:21 INFO mapred.JobClient:  map 100% reduce 91%
09/01/13 14:09:17 INFO mapred.JobClient:  map 100% reduce 92%
09/01/13 14:10:37 INFO mapred.JobClient:  map 100% reduce 93%
09/01/13 14:11:59 INFO mapred.JobClient:  map 100% reduce 94%
09/01/13 14:13:34 INFO mapred.JobClient:  map 100% reduce 95%
09/01/13 14:14:44 INFO mapred.JobClient:  map 100% reduce 96%
09/01/13 14:16:01 INFO mapred.JobClient:  map 100% reduce 97%
09/01/13 14:17:32 INFO mapred.JobClient:  map 100% reduce 98%
09/01/13 14:19:07 INFO mapred.JobClient:  map 100% reduce 99%
09/01/13 14:20:29 INFO mapred.JobClient:  map 100% reduce 100%
09/01/13 14:20:41 INFO mapred.JobClient: Job complete: job_200901131229_0008
09/01/13 14:20:41 INFO mapred.JobClient: Counters: 15
09/01/13 14:20:41 INFO mapred.JobClient:   File Systems
09/01/13 14:20:41 INFO mapred.JobClient:     Local bytes read=4355105034
09/01/13 14:20:41 INFO mapred.JobClient:     Local bytes written=6530991934
09/01/13 14:20:41 INFO mapred.JobClient:   Job Counters
09/01/13 14:20:41 INFO mapred.JobClient:     Launched reduce tasks=12
09/01/13 14:20:41 INFO mapred.JobClient:     Rack-local map tasks=4
09/01/13 14:20:41 INFO mapred.JobClient:     Launched map tasks=14
09/01/13 14:20:41 INFO mapred.JobClient:     Data-local map tasks=10
09/01/13 14:20:41 INFO mapred.JobClient:   Map-Reduce Framework
09/01/13 14:20:41 INFO mapred.JobClient:     Reduce input groups=1600
09/01/13 14:20:41 INFO mapred.JobClient:     Combine output records=0
09/01/13 14:20:41 INFO mapred.JobClient:     Map input records=8000
09/01/13 14:20:41 INFO mapred.JobClient:     Reduce output records=64000
09/01/13 14:20:41 INFO mapred.JobClient:     Map output bytes=2175715600
09/01/13 14:20:41 INFO mapred.JobClient:     Map input bytes=0
09/01/13 14:20:41 INFO mapred.JobClient:     Combine input records=0
09/01/13 14:20:41 INFO mapred.JobClient:     Map output records=320000
09/01/13 14:20:41 INFO mapred.JobClient:     Reduce input records=320000
09/01/13 14:20:43 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the
same.
09/01/13 14:20:43 WARN mapred.JobClient: Use genericOptions for the
option -libjars
09/01/13 14:20:46 WARN mapred.JobClient: No job jar file set.  User
classes may not be found. See JobConf(Class) or
JobConf#setJar(String).
09/01/13 14:20:47 INFO mapred.TableInputFormatBase: split:
0->d8g055.nhncorp.com:,000000000000751
09/01/13 14:20:47 INFO mapred.TableInputFormatBase: split:
1->d8g053.nhncorp.com:000000000000751,000000000001365
09/01/13 14:20:47 INFO mapred.TableInputFormatBase: split:
2->d8g054.nhncorp.com:000000000001365,000000000002364
09/01/13 14:20:47 INFO mapred.TableInputFormatBase: split:
3->d8g054.nhncorp.com:000000000002364,000000000003271
09/01/13 14:20:47 INFO mapred.TableInputFormatBase: split:
4->d8g055.nhncorp.com:000000000003271,000000000004175
09/01/13 14:20:47 INFO mapred.TableInputFormatBase: split:
5->d8g053.nhncorp.com:000000000004175,000000000005069
09/01/13 14:20:47 INFO mapred.TableInputFormatBase: split:
6->d8g053.nhncorp.com:000000000005069,000000000005965
09/01/13 14:20:47 INFO mapred.TableInputFormatBase: split:
7->d8g054.nhncorp.com:000000000005965,000000000006982
09/01/13 14:20:47 INFO mapred.TableInputFormatBase: split:
8->d8g055.nhncorp.com:000000000006982,000000000007490
09/01/13 14:20:47 INFO mapred.TableInputFormatBase: split:
9->d8g055.nhncorp.com:000000000007490,
09/01/13 14:20:47 INFO mapred.JobClient: Running job: job_200901131229_0009
09/01/13 14:20:48 INFO mapred.JobClient:  map 0% reduce 0%
09/01/13 14:24:01 INFO mapred.JobClient:  map 10% reduce 0%
09/01/13 14:24:16 INFO mapred.JobClient:  map 10% reduce 2%
09/01/13 14:24:22 INFO mapred.JobClient:  map 20% reduce 2%
09/01/13 14:24:35 INFO mapred.JobClient:  map 20% reduce 3%
09/01/13 14:24:37 INFO mapred.JobClient:  map 20% reduce 4%
09/01/13 14:24:57 INFO mapred.JobClient:  map 30% reduce 4%
09/01/13 14:25:07 INFO mapred.JobClient:  map 40% reduce 4%
09/01/13 14:25:11 INFO mapred.JobClient:  map 40% reduce 5%
09/01/13 14:25:13 INFO mapred.JobClient:  map 40% reduce 6%
09/01/13 14:25:18 INFO mapred.JobClient:  map 50% reduce 7%
09/01/13 14:25:22 INFO mapred.JobClient:  map 60% reduce 7%
09/01/13 14:25:28 INFO mapred.JobClient:  map 60% reduce 8%
09/01/13 14:25:31 INFO mapred.JobClient:  map 60% reduce 9%
09/01/13 14:25:33 INFO mapred.JobClient:  map 60% reduce 10%
09/01/13 14:25:36 INFO mapred.JobClient:  map 60% reduce 11%
09/01/13 14:27:07 INFO mapred.JobClient:  map 70% reduce 11%
09/01/13 14:27:14 INFO mapred.JobClient:  map 70% reduce 12%
09/01/13 14:27:19 INFO mapred.JobClient:  map 70% reduce 13%
09/01/13 14:28:05 INFO mapred.JobClient:  map 80% reduce 13%
09/01/13 14:28:12 INFO mapred.JobClient:  map 80% reduce 14%
09/01/13 14:28:20 INFO mapred.JobClient:  map 80% reduce 15%
09/01/13 14:29:13 INFO mapred.JobClient:  map 90% reduce 15%
09/01/13 14:29:23 INFO mapred.JobClient:  map 90% reduce 16%
09/01/13 14:29:26 INFO mapred.JobClient:  map 90% reduce 17%
09/01/13 14:29:56 INFO mapred.JobClient:  map 100% reduce 17%
09/01/13 14:30:04 INFO mapred.JobClient:  map 100% reduce 21%
09/01/13 14:30:05 INFO mapred.JobClient:  map 100% reduce 25%
09/01/13 14:30:06 INFO mapred.JobClient:  map 100% reduce 29%
09/01/13 14:30:09 INFO mapred.JobClient:  map 100% reduce 32%
09/01/13 14:30:10 INFO mapred.JobClient:  map 100% reduce 36%
09/01/13 14:30:11 INFO mapred.JobClient:  map 100% reduce 40%
09/01/13 14:31:20 INFO mapred.JobClient:  map 100% reduce 41%
09/01/13 14:32:57 INFO mapred.JobClient:  map 100% reduce 42%
09/01/13 14:34:56 INFO mapred.JobClient:  map 100% reduce 43%
09/01/13 14:36:02 INFO mapred.JobClient:  map 100% reduce 44%
09/01/13 14:36:51 INFO mapred.JobClient:  map 100% reduce 45%
09/01/13 14:38:20 INFO mapred.JobClient:  map 100% reduce 46%
09/01/13 14:39:36 INFO mapred.JobClient:  map 100% reduce 47%
09/01/13 14:40:39 INFO mapred.JobClient:  map 100% reduce 48%
09/01/13 14:42:09 INFO mapred.JobClient:  map 100% reduce 49%
09/01/13 14:43:10 INFO mapred.JobClient:  map 100% reduce 50%
09/01/13 14:45:05 INFO mapred.JobClient:  map 100% reduce 51%
09/01/13 14:47:17 INFO mapred.JobClient:  map 100% reduce 52%
09/01/13 14:48:27 INFO mapred.JobClient:  map 100% reduce 53%
09/01/13 14:49:25 INFO mapred.JobClient:  map 100% reduce 54%
09/01/13 14:50:41 INFO mapred.JobClient:  map 100% reduce 55%
09/01/13 14:51:37 INFO mapred.JobClient:  map 100% reduce 56%
09/01/13 14:52:40 INFO mapred.JobClient:  map 100% reduce 57%
09/01/13 14:53:37 INFO mapred.JobClient:  map 100% reduce 58%
09/01/13 14:55:06 INFO mapred.JobClient:  map 100% reduce 59%
09/01/13 14:55:42 INFO mapred.JobClient:  map 100% reduce 62%
09/01/13 14:55:54 INFO mapred.JobClient:  map 100% reduce 65%
09/01/13 14:55:59 INFO mapred.JobClient:  map 100% reduce 72%
09/01/13 14:56:21 INFO mapred.JobClient:  map 100% reduce 73%
09/01/13 14:56:57 INFO mapred.JobClient:  map 100% reduce 75%
09/01/13 14:57:02 INFO mapred.JobClient:  map 100% reduce 78%
09/01/13 14:57:07 INFO mapred.JobClient:  map 100% reduce 79%
09/01/13 14:57:12 INFO mapred.JobClient:  map 100% reduce 87%
09/01/13 14:58:20 INFO mapred.JobClient:  map 100% reduce 88%
09/01/13 14:59:40 INFO mapred.JobClient:  map 100% reduce 89%
09/01/13 15:01:03 INFO mapred.JobClient:  map 100% reduce 90%
09/01/13 15:03:14 INFO mapred.JobClient:  map 100% reduce 91%
09/01/13 15:05:03 INFO mapred.JobClient:  map 100% reduce 92%
09/01/13 15:06:47 INFO mapred.JobClient:  map 100% reduce 93%
09/01/13 15:09:11 INFO mapred.JobClient:  map 100% reduce 94%
09/01/13 15:10:43 INFO mapred.JobClient:  map 100% reduce 95%
09/01/13 15:11:58 INFO mapred.JobClient:  map 100% reduce 96%
09/01/13 15:12:39 INFO mapred.JobClient:  map 100% reduce 97%
09/01/13 15:13:33 INFO mapred.JobClient:  map 100% reduce 98%
09/01/13 15:14:28 INFO mapred.JobClient:  map 100% reduce 99%
09/01/13 15:15:32 INFO mapred.JobClient:  map 100% reduce 100%
09/01/13 15:15:39 INFO mapred.JobClient: Job complete: job_200901131229_0009
09/01/13 15:15:39 INFO mapred.JobClient: Counters: 15
09/01/13 15:15:39 INFO mapred.JobClient:   File Systems
09/01/13 15:15:39 INFO mapred.JobClient:     Local bytes read=4355174656
09/01/13 15:15:39 INFO mapred.JobClient:     Local bytes written=6530992048
09/01/13 15:15:39 INFO mapred.JobClient:   Job Counters
09/01/13 15:15:39 INFO mapred.JobClient:     Launched reduce tasks=12
09/01/13 15:15:39 INFO mapred.JobClient:     Rack-local map tasks=4
09/01/13 15:15:39 INFO mapred.JobClient:     Launched map tasks=14
09/01/13 15:15:39 INFO mapred.JobClient:     Data-local map tasks=10
09/01/13 15:15:39 INFO mapred.JobClient:   Map-Reduce Framework
09/01/13 15:15:39 INFO mapred.JobClient:     Reduce input groups=1600
09/01/13 15:15:39 INFO mapred.JobClient:     Combine output records=0
09/01/13 15:15:39 INFO mapred.JobClient:     Map input records=8000
09/01/13 15:15:39 INFO mapred.JobClient:     Reduce output records=64000
09/01/13 15:15:39 INFO mapred.JobClient:     Map output bytes=2175715600
09/01/13 15:15:39 INFO mapred.JobClient:     Map input bytes=0
09/01/13 15:15:39 INFO mapred.JobClient:     Combine input records=0
09/01/13 15:15:39 INFO mapred.JobClient:     Map output records=320000
09/01/13 15:15:39 INFO mapred.JobClient:     Reduce input records=320000
09/01/13 15:15:40 INFO hama.AbstractMatrix: Initializing the matrix storage.
09/01/13 15:15:44 INFO hama.AbstractMatrix: Create Matrix DenseMatrix_randcrwcp
09/01/13 15:15:44 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the
same.
09/01/13 15:15:44 WARN mapred.JobClient: Use genericOptions for the
option -libjars
09/01/13 15:15:44 WARN mapred.JobClient: No job jar file set.  User
classes may not be found. See JobConf(Class) or
JobConf#setJar(String).
09/01/13 15:15:44 INFO mapred.TableInputFormatBase: split:
0->d8g054.nhncorp.com:,00000000000,15,35-25403
09/01/13 15:15:44 INFO mapred.TableInputFormatBase: split:
1->d8g055.nhncorp.com:00000000000,15,35-25403,00000000000,20,17-32709
09/01/13 15:15:44 INFO mapred.TableInputFormatBase: split:
2->d8g054.nhncorp.com:00000000000,20,17-32709,00000000000,24,26-39446
09/01/13 15:15:44 INFO mapred.TableInputFormatBase: split:
3->d8g053.nhncorp.com:00000000000,24,26-39446,00000000000,28,39-46363
09/01/13 15:15:44 INFO mapred.TableInputFormatBase: split:
4->d8g053.nhncorp.com:00000000000,28,39-46363,00000000000,32,19-51978
09/01/13 15:15:44 INFO mapred.TableInputFormatBase: split:
5->d8g054.nhncorp.com:00000000000,32,19-51978,00000000000,37,13-59758
09/01/13 15:15:44 INFO mapred.TableInputFormatBase: split:
6->d8g054.nhncorp.com:00000000000,37,13-59758,000000000000,1,26-2645
09/01/13 15:15:44 INFO mapred.TableInputFormatBase: split:
7->d8g054.nhncorp.com:000000000000,1,26-2645,000000000000,26,2-41710
09/01/13 15:15:44 INFO mapred.TableInputFormatBase: split:
8->d8g053.nhncorp.com:000000000000,26,2-41710,000000000000,5,31-9247
09/01/13 15:15:44 INFO mapred.TableInputFormatBase: split:
9->d8g055.nhncorp.com:000000000000,5,31-9247,
09/01/13 15:15:44 INFO mapred.JobClient: Running job: job_200901131229_0010
09/01/13 15:15:45 INFO mapred.JobClient:  map 0% reduce 0%
09/01/13 15:22:40 INFO mapred.JobClient:  map 10% reduce 0%
09/01/13 15:22:53 INFO mapred.JobClient:  map 10% reduce 1%
09/01/13 15:22:58 INFO mapred.JobClient:  map 10% reduce 2%
09/01/13 15:25:17 INFO mapred.JobClient:  map 20% reduce 2%
09/01/13 15:25:25 INFO mapred.JobClient:  map 30% reduce 2%
09/01/13 15:25:32 INFO mapred.JobClient:  map 30% reduce 3%
09/01/13 15:25:36 INFO mapred.JobClient:  map 30% reduce 4%
09/01/13 15:25:40 INFO mapred.JobClient:  map 30% reduce 5%
09/01/13 15:25:42 INFO mapred.JobClient:  map 30% reduce 6%
09/01/13 15:25:43 INFO mapred.JobClient:  map 40% reduce 6%
09/01/13 15:25:55 INFO mapred.JobClient:  map 40% reduce 7%
09/01/13 15:29:14 INFO mapred.JobClient:  map 50% reduce 7%
09/01/13 15:29:27 INFO mapred.JobClient:  map 50% reduce 8%
09/01/13 15:29:29 INFO mapred.JobClient:  map 50% reduce 9%
09/01/13 15:35:18 INFO mapred.JobClient:  map 60% reduce 9%
09/01/13 15:35:28 INFO mapred.JobClient:  map 60% reduce 10%
09/01/13 15:35:36 INFO mapred.JobClient:  map 60% reduce 11%
09/01/13 15:36:04 INFO mapred.JobClient:  map 70% reduce 11%
09/01/13 15:36:14 INFO mapred.JobClient:  map 70% reduce 12%
09/01/13 15:36:19 INFO mapred.JobClient:  map 70% reduce 13%
09/01/13 15:38:13 INFO mapred.JobClient:  map 80% reduce 13%
09/01/13 15:38:23 INFO mapred.JobClient:  map 80% reduce 14%
09/01/13 15:38:29 INFO mapred.JobClient:  map 80% reduce 15%
09/01/13 15:42:34 INFO mapred.JobClient:  map 90% reduce 15%
09/01/13 15:42:44 INFO mapred.JobClient:  map 90% reduce 16%
09/01/13 15:42:49 INFO mapred.JobClient:  map 90% reduce 17%
09/01/13 15:45:07 INFO mapred.JobClient:  map 100% reduce 17%
09/01/13 15:45:12 INFO mapred.JobClient:  map 100% reduce 21%
09/01/13 15:45:16 INFO mapred.JobClient:  map 100% reduce 22%
09/01/13 15:45:20 INFO mapred.JobClient:  map 100% reduce 25%
09/01/13 15:45:21 INFO mapred.JobClient:  map 100% reduce 29%
09/01/13 15:45:26 INFO mapred.JobClient:  map 100% reduce 33%
09/01/13 15:45:31 INFO mapred.JobClient:  map 100% reduce 34%
09/01/13 15:45:36 INFO mapred.JobClient:  map 100% reduce 40%
09/01/13 15:45:41 INFO mapred.JobClient:  map 100% reduce 41%
09/01/13 15:45:56 INFO mapred.JobClient:  map 100% reduce 42%
09/01/13 15:46:12 INFO mapred.JobClient:  map 100% reduce 43%
09/01/13 15:46:31 INFO mapred.JobClient:  map 100% reduce 44%
09/01/13 15:46:56 INFO mapred.JobClient:  map 100% reduce 45%
09/01/13 15:47:12 INFO mapred.JobClient:  map 100% reduce 46%
09/01/13 15:47:32 INFO mapred.JobClient:  map 100% reduce 47%
09/01/13 15:47:51 INFO mapred.JobClient:  map 100% reduce 48%
09/01/13 15:48:21 INFO mapred.JobClient:  map 100% reduce 49%
09/01/13 15:48:45 INFO mapred.JobClient:  map 100% reduce 50%
09/01/13 15:49:07 INFO mapred.JobClient:  map 100% reduce 51%
09/01/13 15:49:31 INFO mapred.JobClient:  map 100% reduce 52%
09/01/13 15:49:52 INFO mapred.JobClient:  map 100% reduce 53%
09/01/13 15:50:17 INFO mapred.JobClient:  map 100% reduce 54%
09/01/13 15:50:46 INFO mapred.JobClient:  map 100% reduce 55%
09/01/13 15:51:12 INFO mapred.JobClient:  map 100% reduce 56%
09/01/13 15:51:36 INFO mapred.JobClient:  map 100% reduce 57%
09/01/13 15:51:56 INFO mapred.JobClient:  map 100% reduce 58%
09/01/13 15:52:12 INFO mapred.JobClient:  map 100% reduce 59%
09/01/13 15:52:27 INFO mapred.JobClient:  map 100% reduce 60%
09/01/13 15:52:45 INFO mapred.JobClient:  map 100% reduce 61%
09/01/13 15:52:56 INFO mapred.JobClient:  map 100% reduce 62%
09/01/13 15:53:01 INFO mapred.JobClient:  map 100% reduce 63%
09/01/13 15:53:25 INFO mapred.JobClient:  map 100% reduce 64%
09/01/13 15:53:27 INFO mapred.JobClient:  map 100% reduce 65%
09/01/13 15:53:32 INFO mapred.JobClient:  map 100% reduce 66%
09/01/13 15:53:36 INFO mapred.JobClient:  map 100% reduce 67%
09/01/13 15:54:17 INFO mapred.JobClient:  map 100% reduce 68%
09/01/13 15:54:22 INFO mapred.JobClient:  map 100% reduce 69%
09/01/13 15:54:47 INFO mapred.JobClient:  map 100% reduce 70%
09/01/13 15:55:29 INFO mapred.JobClient:  map 100% reduce 71%
09/01/13 15:55:57 INFO mapred.JobClient:  map 100% reduce 75%
09/01/13 15:56:04 INFO mapred.JobClient:  map 100% reduce 78%
09/01/13 15:56:06 INFO mapred.JobClient:  map 100% reduce 79%
09/01/13 15:56:16 INFO mapred.JobClient:  map 100% reduce 83%
09/01/13 15:56:21 INFO mapred.JobClient:  map 100% reduce 86%
09/01/13 15:56:44 INFO mapred.JobClient:  map 100% reduce 87%
09/01/13 15:57:12 INFO mapred.JobClient:  map 100% reduce 88%
09/01/13 15:57:52 INFO mapred.JobClient:  map 100% reduce 89%
09/01/13 15:58:39 INFO mapred.JobClient:  map 100% reduce 90%
09/01/13 15:59:25 INFO mapred.JobClient:  map 100% reduce 91%
09/01/13 16:00:02 INFO mapred.JobClient:  map 100% reduce 92%
09/01/13 16:00:52 INFO mapred.JobClient:  map 100% reduce 93%
09/01/13 16:01:47 INFO mapred.JobClient:  map 100% reduce 94%
09/01/13 16:02:43 INFO mapred.JobClient:  map 100% reduce 95%
09/01/13 16:03:23 INFO mapred.JobClient:  map 100% reduce 96%
09/01/13 16:04:08 INFO mapred.JobClient:  map 100% reduce 97%
09/01/13 16:04:38 INFO mapred.JobClient:  map 100% reduce 98%
09/01/13 16:05:12 INFO mapred.JobClient:  map 100% reduce 99%
09/01/13 16:06:03 INFO mapred.JobClient:  map 100% reduce 100%
09/01/13 16:06:17 INFO mapred.JobClient: Job complete: job_200901131229_0010
09/01/13 16:06:17 INFO mapred.JobClient: Counters: 15
09/01/13 16:06:17 INFO mapred.JobClient:   File Systems
09/01/13 16:06:17 INFO mapred.JobClient:     Local bytes read=56408501475
09/01/13 16:06:17 INFO mapred.JobClient:     Local bytes written=76882028610
09/01/13 16:06:17 INFO mapred.JobClient:   Job Counters
09/01/13 16:06:17 INFO mapred.JobClient:     Launched reduce tasks=12
09/01/13 16:06:17 INFO mapred.JobClient:     Rack-local map tasks=3
09/01/13 16:06:17 INFO mapred.JobClient:     Launched map tasks=13
09/01/13 16:06:17 INFO mapred.JobClient:     Data-local map tasks=10
09/01/13 16:06:17 INFO mapred.JobClient:   Map-Reduce Framework
09/01/13 16:06:17 INFO mapred.JobClient:     Reduce input groups=1600
09/01/13 16:06:17 INFO mapred.JobClient:     Combine output records=0
09/01/13 16:06:17 INFO mapred.JobClient:     Map input records=64000
09/01/13 16:06:17 INFO mapred.JobClient:     Reduce output records=320000
09/01/13 16:06:17 INFO mapred.JobClient:     Map output bytes=20481920000
09/01/13 16:06:17 INFO mapred.JobClient:     Map input bytes=0
09/01/13 16:06:17 INFO mapred.JobClient:     Combine input records=0
09/01/13 16:06:17 INFO mapred.JobClient:     Map output records=64000
09/01/13 16:06:17 INFO mapred.JobClient:     Reduce input records=64000
c(0, 0) : 1979.5115070085433
c(0, 1) : 2013.49562739548
...


On Wed, Jan 7, 2009 at 6:20 PM, Edward J. Yoon <ed...@apache.org> wrote:
> After commit HAMA-142, I finally fulfilled the multiplication of
> 10,000 * 10,000 dense matrices. I am gratified with this result. But,
> there is a lot of netsent/netreceived bytes between master and slaves
> and overhead of read operation in a loop during multiplication.
>
> BTW, blocked dense matrix have small rows. Hence, It doesn't
> horizontally spread to each machine.
>
> 09/01/07 17:36:14 INFO mapred.TableInputFormatBase: split:
> 0->d8g053.nhncorp.com:,000000000000,0,10
> 09/01/07 17:36:14 INFO mapred.TableInputFormatBase: split:
> 1->d8g053.nhncorp.com:000000000000,0,10,
>
> /Edward
>
> On Tue, Jan 6, 2009 at 2:11 PM, Edward J. Yoon <ed...@apache.org> wrote:
>> Oh, sorry. It's 8 GB.
>>
>> On Tue, Jan 6, 2009 at 2:05 PM, Edward J. Yoon <ed...@apache.org> wrote:
>>> Let's assume matrix a * b of 10,000 * 10,000 dense matrices,
>>>
>>> 5 * 5 blocks,
>>> 1 block is 2000 * 2000 and 16 MB,
>>>
>>> 0 : c(0, 0) += a(0, 0) * b(0, 0)
>>> 1 : c(0, 1) += a(0, 0) * b(0, 1)
>>> ...
>>> 123 : c(4, 3) += a(4, 4) * b(4, 3)
>>> 124 : c(4, 4) += a(4, 4) * b(4, 4)
>>>
>>> 5^3 * 32 MB = 4 GB.
>>>
>>> collection table size is 4 GB. Anyway, let's try it.
>>>
>>> On Tue, Jan 6, 2009 at 12:37 PM, Samuel Guo <gu...@gmail.com> wrote:
>>>> +1
>>>> hmm, it is tricky.
>>>>
>>>> On Tue, Jan 6, 2009 at 11:04 AM, Edward J. Yoon <ed...@apache.org>wrote:
>>>>
>>>>> If we collect blocks to one table during blocking_mapred(), locality
>>>>> will be provided and more faster.
>>>>>
>>>>> row Key   column:A   column:B
>>>>> c(0, 0) += a(0, 0) * b(0, 0)
>>>>> c(0, 0) += a(0, 1) * b(1, 0)
>>>>> c(0, 0) += a(0, 2) * b(2, 0)
>>>>> c(0, 0) += a(0, 3) * b(3, 0)
>>>>> c(0, 1) += a(0, 0) * b(0, 1)
>>>>> c(0, 1) += a(0, 1) * b(1, 1)
>>>>> ...
>>>>>
>>>>> What do you think?
>>>>>
>>>>> On Mon, Jan 5, 2009 at 10:30 AM, Edward J. Yoon <ed...@apache.org>
>>>>> wrote:
>>>>> > Hama Trunk doesn't work for large matrices multiplication with
>>>>> > mapred.task.timeout and scanner.timeout exception. I tried 1,000,000 *
>>>>> > 1,000,000 matrix multiplication on 100 node. (Rests are good)
>>>>> >
>>>>> > To reduce read operation of duplicated block, I thought as describe
>>>>> > below. But, each map processing seems too large.
>>>>> >
>>>>> > ----
>>>>> > // c[i][k] += a[i][j] * b[j][k];
>>>>> >
>>>>> > map() {
>>>>> >  SubMatrix a = value.get();
>>>>> >
>>>>> >  for (RowResult row : scan) {
>>>>> >     collect : c[i][k] = a * b[j][k];
>>>>> >  }
>>>>> > }
>>>>> >
>>>>> > reduce() {
>>>>> >  c[i][k] += c[i][k];
>>>>> > }
>>>>> > ----
>>>>> >
>>>>> > Should we increase {mapred.task.timeout and scanner.timeout}?
>>>>> > or any good idea?
>>>>> >
>>>>> > --
>>>>> > Best Regards, Edward J. Yoon @ NHN, corp.
>>>>> > edwardyoon@apache.org
>>>>> > http://blog.udanax.org
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards, Edward J. Yoon @ NHN, corp.
>>>>> edwardyoon@apache.org
>>>>> http://blog.udanax.org
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Best Regards, Edward J. Yoon @ NHN, corp.
>>> edwardyoon@apache.org
>>> http://blog.udanax.org
>>>
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon @ NHN, corp.
>> edwardyoon@apache.org
>> http://blog.udanax.org
>>
>
>
>
> --
> Best Regards, Edward J. Yoon @ NHN, corp.
> edwardyoon@apache.org
> http://blog.udanax.org
>



-- 
Best Regards, Edward J. Yoon @ NHN, corp.
edwardyoon@apache.org
http://blog.udanax.org

Re: Large matrices multiplication problem.

Posted by "Edward J. Yoon" <ed...@apache.org>.
After commit HAMA-142, I finally fulfilled the multiplication of
10,000 * 10,000 dense matrices. I am gratified with this result. But,
there is a lot of netsent/netreceived bytes between master and slaves
and overhead of read operation in a loop during multiplication.

BTW, blocked dense matrix have small rows. Hence, It doesn't
horizontally spread to each machine.

09/01/07 17:36:14 INFO mapred.TableInputFormatBase: split:
0->d8g053.nhncorp.com:,000000000000,0,10
09/01/07 17:36:14 INFO mapred.TableInputFormatBase: split:
1->d8g053.nhncorp.com:000000000000,0,10,

/Edward

On Tue, Jan 6, 2009 at 2:11 PM, Edward J. Yoon <ed...@apache.org> wrote:
> Oh, sorry. It's 8 GB.
>
> On Tue, Jan 6, 2009 at 2:05 PM, Edward J. Yoon <ed...@apache.org> wrote:
>> Let's assume matrix a * b of 10,000 * 10,000 dense matrices,
>>
>> 5 * 5 blocks,
>> 1 block is 2000 * 2000 and 16 MB,
>>
>> 0 : c(0, 0) += a(0, 0) * b(0, 0)
>> 1 : c(0, 1) += a(0, 0) * b(0, 1)
>> ...
>> 123 : c(4, 3) += a(4, 4) * b(4, 3)
>> 124 : c(4, 4) += a(4, 4) * b(4, 4)
>>
>> 5^3 * 32 MB = 4 GB.
>>
>> collection table size is 4 GB. Anyway, let's try it.
>>
>> On Tue, Jan 6, 2009 at 12:37 PM, Samuel Guo <gu...@gmail.com> wrote:
>>> +1
>>> hmm, it is tricky.
>>>
>>> On Tue, Jan 6, 2009 at 11:04 AM, Edward J. Yoon <ed...@apache.org>wrote:
>>>
>>>> If we collect blocks to one table during blocking_mapred(), locality
>>>> will be provided and more faster.
>>>>
>>>> row Key   column:A   column:B
>>>> c(0, 0) += a(0, 0) * b(0, 0)
>>>> c(0, 0) += a(0, 1) * b(1, 0)
>>>> c(0, 0) += a(0, 2) * b(2, 0)
>>>> c(0, 0) += a(0, 3) * b(3, 0)
>>>> c(0, 1) += a(0, 0) * b(0, 1)
>>>> c(0, 1) += a(0, 1) * b(1, 1)
>>>> ...
>>>>
>>>> What do you think?
>>>>
>>>> On Mon, Jan 5, 2009 at 10:30 AM, Edward J. Yoon <ed...@apache.org>
>>>> wrote:
>>>> > Hama Trunk doesn't work for large matrices multiplication with
>>>> > mapred.task.timeout and scanner.timeout exception. I tried 1,000,000 *
>>>> > 1,000,000 matrix multiplication on 100 node. (Rests are good)
>>>> >
>>>> > To reduce read operation of duplicated block, I thought as describe
>>>> > below. But, each map processing seems too large.
>>>> >
>>>> > ----
>>>> > // c[i][k] += a[i][j] * b[j][k];
>>>> >
>>>> > map() {
>>>> >  SubMatrix a = value.get();
>>>> >
>>>> >  for (RowResult row : scan) {
>>>> >     collect : c[i][k] = a * b[j][k];
>>>> >  }
>>>> > }
>>>> >
>>>> > reduce() {
>>>> >  c[i][k] += c[i][k];
>>>> > }
>>>> > ----
>>>> >
>>>> > Should we increase {mapred.task.timeout and scanner.timeout}?
>>>> > or any good idea?
>>>> >
>>>> > --
>>>> > Best Regards, Edward J. Yoon @ NHN, corp.
>>>> > edwardyoon@apache.org
>>>> > http://blog.udanax.org
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards, Edward J. Yoon @ NHN, corp.
>>>> edwardyoon@apache.org
>>>> http://blog.udanax.org
>>>>
>>>
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon @ NHN, corp.
>> edwardyoon@apache.org
>> http://blog.udanax.org
>>
>
>
>
> --
> Best Regards, Edward J. Yoon @ NHN, corp.
> edwardyoon@apache.org
> http://blog.udanax.org
>



-- 
Best Regards, Edward J. Yoon @ NHN, corp.
edwardyoon@apache.org
http://blog.udanax.org

Re: Large matrices multiplication problem.

Posted by "Edward J. Yoon" <ed...@apache.org>.
Oh, sorry. It's 8 GB.

On Tue, Jan 6, 2009 at 2:05 PM, Edward J. Yoon <ed...@apache.org> wrote:
> Let's assume matrix a * b of 10,000 * 10,000 dense matrices,
>
> 5 * 5 blocks,
> 1 block is 2000 * 2000 and 16 MB,
>
> 0 : c(0, 0) += a(0, 0) * b(0, 0)
> 1 : c(0, 1) += a(0, 0) * b(0, 1)
> ...
> 123 : c(4, 3) += a(4, 4) * b(4, 3)
> 124 : c(4, 4) += a(4, 4) * b(4, 4)
>
> 5^3 * 32 MB = 4 GB.
>
> collection table size is 4 GB. Anyway, let's try it.
>
> On Tue, Jan 6, 2009 at 12:37 PM, Samuel Guo <gu...@gmail.com> wrote:
>> +1
>> hmm, it is tricky.
>>
>> On Tue, Jan 6, 2009 at 11:04 AM, Edward J. Yoon <ed...@apache.org>wrote:
>>
>>> If we collect blocks to one table during blocking_mapred(), locality
>>> will be provided and more faster.
>>>
>>> row Key   column:A   column:B
>>> c(0, 0) += a(0, 0) * b(0, 0)
>>> c(0, 0) += a(0, 1) * b(1, 0)
>>> c(0, 0) += a(0, 2) * b(2, 0)
>>> c(0, 0) += a(0, 3) * b(3, 0)
>>> c(0, 1) += a(0, 0) * b(0, 1)
>>> c(0, 1) += a(0, 1) * b(1, 1)
>>> ...
>>>
>>> What do you think?
>>>
>>> On Mon, Jan 5, 2009 at 10:30 AM, Edward J. Yoon <ed...@apache.org>
>>> wrote:
>>> > Hama Trunk doesn't work for large matrices multiplication with
>>> > mapred.task.timeout and scanner.timeout exception. I tried 1,000,000 *
>>> > 1,000,000 matrix multiplication on 100 node. (Rests are good)
>>> >
>>> > To reduce read operation of duplicated block, I thought as describe
>>> > below. But, each map processing seems too large.
>>> >
>>> > ----
>>> > // c[i][k] += a[i][j] * b[j][k];
>>> >
>>> > map() {
>>> >  SubMatrix a = value.get();
>>> >
>>> >  for (RowResult row : scan) {
>>> >     collect : c[i][k] = a * b[j][k];
>>> >  }
>>> > }
>>> >
>>> > reduce() {
>>> >  c[i][k] += c[i][k];
>>> > }
>>> > ----
>>> >
>>> > Should we increase {mapred.task.timeout and scanner.timeout}?
>>> > or any good idea?
>>> >
>>> > --
>>> > Best Regards, Edward J. Yoon @ NHN, corp.
>>> > edwardyoon@apache.org
>>> > http://blog.udanax.org
>>> >
>>>
>>>
>>>
>>> --
>>> Best Regards, Edward J. Yoon @ NHN, corp.
>>> edwardyoon@apache.org
>>> http://blog.udanax.org
>>>
>>
>
>
>
> --
> Best Regards, Edward J. Yoon @ NHN, corp.
> edwardyoon@apache.org
> http://blog.udanax.org
>



-- 
Best Regards, Edward J. Yoon @ NHN, corp.
edwardyoon@apache.org
http://blog.udanax.org

Re: Large matrices multiplication problem.

Posted by "Edward J. Yoon" <ed...@apache.org>.
Let's assume matrix a * b of 10,000 * 10,000 dense matrices,

5 * 5 blocks,
1 block is 2000 * 2000 and 16 MB,

0 : c(0, 0) += a(0, 0) * b(0, 0)
1 : c(0, 1) += a(0, 0) * b(0, 1)
...
123 : c(4, 3) += a(4, 4) * b(4, 3)
124 : c(4, 4) += a(4, 4) * b(4, 4)

5^3 * 32 MB = 4 GB.

collection table size is 4 GB. Anyway, let's try it.

On Tue, Jan 6, 2009 at 12:37 PM, Samuel Guo <gu...@gmail.com> wrote:
> +1
> hmm, it is tricky.
>
> On Tue, Jan 6, 2009 at 11:04 AM, Edward J. Yoon <ed...@apache.org>wrote:
>
>> If we collect blocks to one table during blocking_mapred(), locality
>> will be provided and more faster.
>>
>> row Key   column:A   column:B
>> c(0, 0) += a(0, 0) * b(0, 0)
>> c(0, 0) += a(0, 1) * b(1, 0)
>> c(0, 0) += a(0, 2) * b(2, 0)
>> c(0, 0) += a(0, 3) * b(3, 0)
>> c(0, 1) += a(0, 0) * b(0, 1)
>> c(0, 1) += a(0, 1) * b(1, 1)
>> ...
>>
>> What do you think?
>>
>> On Mon, Jan 5, 2009 at 10:30 AM, Edward J. Yoon <ed...@apache.org>
>> wrote:
>> > Hama Trunk doesn't work for large matrices multiplication with
>> > mapred.task.timeout and scanner.timeout exception. I tried 1,000,000 *
>> > 1,000,000 matrix multiplication on 100 node. (Rests are good)
>> >
>> > To reduce read operation of duplicated block, I thought as describe
>> > below. But, each map processing seems too large.
>> >
>> > ----
>> > // c[i][k] += a[i][j] * b[j][k];
>> >
>> > map() {
>> >  SubMatrix a = value.get();
>> >
>> >  for (RowResult row : scan) {
>> >     collect : c[i][k] = a * b[j][k];
>> >  }
>> > }
>> >
>> > reduce() {
>> >  c[i][k] += c[i][k];
>> > }
>> > ----
>> >
>> > Should we increase {mapred.task.timeout and scanner.timeout}?
>> > or any good idea?
>> >
>> > --
>> > Best Regards, Edward J. Yoon @ NHN, corp.
>> > edwardyoon@apache.org
>> > http://blog.udanax.org
>> >
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon @ NHN, corp.
>> edwardyoon@apache.org
>> http://blog.udanax.org
>>
>



-- 
Best Regards, Edward J. Yoon @ NHN, corp.
edwardyoon@apache.org
http://blog.udanax.org

Re: Large matrices multiplication problem.

Posted by Samuel Guo <gu...@gmail.com>.
+1
hmm, it is tricky.

On Tue, Jan 6, 2009 at 11:04 AM, Edward J. Yoon <ed...@apache.org>wrote:

> If we collect blocks to one table during blocking_mapred(), locality
> will be provided and more faster.
>
> row Key   column:A   column:B
> c(0, 0) += a(0, 0) * b(0, 0)
> c(0, 0) += a(0, 1) * b(1, 0)
> c(0, 0) += a(0, 2) * b(2, 0)
> c(0, 0) += a(0, 3) * b(3, 0)
> c(0, 1) += a(0, 0) * b(0, 1)
> c(0, 1) += a(0, 1) * b(1, 1)
> ...
>
> What do you think?
>
> On Mon, Jan 5, 2009 at 10:30 AM, Edward J. Yoon <ed...@apache.org>
> wrote:
> > Hama Trunk doesn't work for large matrices multiplication with
> > mapred.task.timeout and scanner.timeout exception. I tried 1,000,000 *
> > 1,000,000 matrix multiplication on 100 node. (Rests are good)
> >
> > To reduce read operation of duplicated block, I thought as describe
> > below. But, each map processing seems too large.
> >
> > ----
> > // c[i][k] += a[i][j] * b[j][k];
> >
> > map() {
> >  SubMatrix a = value.get();
> >
> >  for (RowResult row : scan) {
> >     collect : c[i][k] = a * b[j][k];
> >  }
> > }
> >
> > reduce() {
> >  c[i][k] += c[i][k];
> > }
> > ----
> >
> > Should we increase {mapred.task.timeout and scanner.timeout}?
> > or any good idea?
> >
> > --
> > Best Regards, Edward J. Yoon @ NHN, corp.
> > edwardyoon@apache.org
> > http://blog.udanax.org
> >
>
>
>
> --
> Best Regards, Edward J. Yoon @ NHN, corp.
> edwardyoon@apache.org
> http://blog.udanax.org
>

Re: Large matrices multiplication problem.

Posted by "Edward J. Yoon" <ed...@apache.org>.
If we collect blocks to one table during blocking_mapred(), locality
will be provided and more faster.

row Key   column:A   column:B
c(0, 0) += a(0, 0) * b(0, 0)
c(0, 0) += a(0, 1) * b(1, 0)
c(0, 0) += a(0, 2) * b(2, 0)
c(0, 0) += a(0, 3) * b(3, 0)
c(0, 1) += a(0, 0) * b(0, 1)
c(0, 1) += a(0, 1) * b(1, 1)
...

What do you think?

On Mon, Jan 5, 2009 at 10:30 AM, Edward J. Yoon <ed...@apache.org> wrote:
> Hama Trunk doesn't work for large matrices multiplication with
> mapred.task.timeout and scanner.timeout exception. I tried 1,000,000 *
> 1,000,000 matrix multiplication on 100 node. (Rests are good)
>
> To reduce read operation of duplicated block, I thought as describe
> below. But, each map processing seems too large.
>
> ----
> // c[i][k] += a[i][j] * b[j][k];
>
> map() {
>  SubMatrix a = value.get();
>
>  for (RowResult row : scan) {
>     collect : c[i][k] = a * b[j][k];
>  }
> }
>
> reduce() {
>  c[i][k] += c[i][k];
> }
> ----
>
> Should we increase {mapred.task.timeout and scanner.timeout}?
> or any good idea?
>
> --
> Best Regards, Edward J. Yoon @ NHN, corp.
> edwardyoon@apache.org
> http://blog.udanax.org
>



-- 
Best Regards, Edward J. Yoon @ NHN, corp.
edwardyoon@apache.org
http://blog.udanax.org