You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Paul Mahon <pm...@decarta.com> on 2011/05/01 00:34:02 UTC

Re: LanczosSolver Very Slow

Yep, I can confirm, it took me about 23 minutes to run it this time.

On 04/30/2011 12:24 PM, Jake Mannix wrote:
> Paul, I ran your data this morning (on my laptop, pseudo distributed mode)
>
> INFO: Program took 1388093 ms
>
> Yay!
>
>    -jake
>
> On Fri, Apr 29, 2011 at 10:24 PM, Jake Mannix<ja...@gmail.com>  wrote:
>
>> Great, let me know what you see.  I just checked in the last cleanup stuff.
>>
>>
>> On Fri, Apr 29, 2011 at 9:55 PM, Paul Mahon<pm...@decarta.com>  wrote:
>>
>>> That's for the quick work tracking this down. I'll give it another try and
>>> see if it's faster now.
>>>
>>>
>>> On 04/29/2011 09:46 PM, Jake Mannix wrote:
>>>
>>>> The commit I made a few minutes ago resolving MAHOUT-639 most likely has
>>>> fixed this, but there are
>>>> a few cleanup issues left coming shortly (instanceof DenseVector /
>>>> instanceof
>>>> SequentialAccessSparseVector, at places where isDense() should be used,
>>>> in
>>>> case of NamedVector
>>>> showing up), which avoid too many extra vector copies.
>>>>
>>>> Paul, try trunk again?
>>>>
>>>>    -jake
>>>>
>>>> On Fri, Apr 29, 2011 at 2:32 PM, Jake Mannix<ja...@gmail.com>
>>>>   wrote:
>>>>
>>>>   Ok, I think I've found it.  It was this issue, and another similar one.
>>>>>   Going to verify in a bit, and patch
>>>>> coming later this afternoon!
>>>>>
>>>>>    -jake
>>>>>
>>>>>
>>>>> On Fri, Apr 29, 2011 at 1:42 PM, Jake Mannix<jake.mannix@gmail.com
>>>>>> wrote:
>>>>>   I think this is actually a bug (or series of bugs) somewhere, not a
>>>>>> configuration issue, as shown by the fact
>>>>>> that LanczosSolver finishes just fine in 10 min.
>>>>>>
>>>>>> I'm looking into it.  I've already found one problem (TimesSquaredJob
>>>>>> checks instanceOf DenseVector instead
>>>>>> of calling isDense(), which gets the wrong answer for NamedVector
>>>>>> instances), seeing if this is the slowdown
>>>>>> cause now.
>>>>>>
>>>>>>    -jake
>>>>>>
>>>>>> On Fri, Apr 29, 2011 at 1:08 PM, Ken Krugler<
>>>>>> kkrugler_lists@transpac.com
>>>>>>
>>>>>>> wrote:
>>>>>>> Hi Paul,
>>>>>>>
>>>>>>> One random guess here - the total in-use memory for all tasks means
>>>>>>> you've run out of physical memory, and are using swap.
>>>>>>>
>>>>>>> E.g. the use of JVM memory by hash maps is one way this can happen.
>>>>>>>
>>>>>>> Once that occurs, your performance will become effectively 0, as tasks
>>>>>>> compete with each other for memory.
>>>>>>>
>>>>>>> To see if this is happening, run 'top' on a slave and see how much
>>>>>>> swap
>>>>>>> is being used.
>>>>>>>
>>>>>>> -- Ken
>>>>>>>
>>>>>>> On Apr 29, 2011, at 11:49am, Paul Mahon wrote:
>>>>>>>
>>>>>>>   I did dump the stack a few times. Here are a couple which came up
>>>>>>>> most
>>>>>>>>
>>>>>>> often:
>>>>>>>
>>>>>>>> "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
>>>>>>>>    java.lang.Thread.State: RUNNABLE
>>>>>>>>     at
>>>>>>>>
>>>>>>> org.apache.mahout.math.map.OpenIntDoubleHashMap.indexOfKey(OpenIntDoubleHashMap.java:285)
>>>>>>>
>>>>>>>>     at
>>>>>>>>
>>>>>>> org.apache.mahout.math.map.OpenIntDoubleHashMap.get(OpenIntDoubleHashMap.java:202)
>>>>>>>
>>>>>>>>     at
>>>>>>>>
>>>>>>> org.apache.mahout.math.RandomAccessSparseVector.getQuick(RandomAccessSparseVector.java:129)
>>>>>>>
>>>>>>>>     at
>>>>>>>>
>>>>>>> org.apache.mahout.math.AbstractVector.assign(AbstractVector.java:527)
>>>>>>>
>>>>>>>>     at
>>>>>>>>
>>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:236)
>>>>>>>
>>>>>>>>     at
>>>>>>>>
>>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
>>>>>>>
>>>>>>>>     at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>>>>>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>>>>>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>>>>>>>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>>>>>
>>>>>>>> "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
>>>>>>>>    java.lang.Thread.State: RUNNABLE
>>>>>>>>     at
>>>>>>>>
>>>>>>> org.apache.mahout.math.AbstractVector.assign(AbstractVector.java:526)
>>>>>>>
>>>>>>>>     at
>>>>>>>>
>>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:236)
>>>>>>>
>>>>>>>>     at
>>>>>>>>
>>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
>>>>>>>
>>>>>>>>     at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>>>>>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>>>>>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>>>>>>>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>>>>>
>>>>>>>> "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
>>>>>>>>    java.lang.Thread.State: RUNNABLE
>>>>>>>>     at
>>>>>>>>
>>>>>>> org.apache.mahout.math.RandomAccessSparseVector.dot(RandomAccessSparseVector.java:192)
>>>>>>>
>>>>>>>>     at
>>>>>>>>
>>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.scale(TimesSquaredJob.java:241)
>>>>>>>
>>>>>>>>     at
>>>>>>>>
>>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:232)
>>>>>>>
>>>>>>>>     at
>>>>>>>>
>>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
>>>>>>>
>>>>>>>>     at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>>>>>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>>>>>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>>>>>>>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 04/29/2011 10:42 AM, Paul Mahon wrote:
>>>>>>>>
>>>>>>>>> The task logs don't have anything useable:
>>>>>>>>> *_syslog logs_*
>>>>>>>>>
>>>>>>>>> 2011-04-28 15:29:17,922 INFO
>>>>>>>>> org.apache.hadoop.metrics.jvm.JvmMetrics:
>>>>>>>>>
>>>>>>>> Initializing JVM Metrics with processName=MAP, sessionId=
>>>>>>>> 2011-04-28 15:29:18,114 INFO org.apache.hadoop.mapred.MapTask:
>>>>>>>> numReduceTasks: 50
>>>>>>>> 2011-04-28 15:29:18,124 INFO org.apache.hadoop.mapred.MapTask:
>>>>>>>> io.sort.mb = 100
>>>>>>>> 2011-04-28 15:29:18,266 INFO org.apache.hadoop.mapred.MapTask: data
>>>>>>>> buffer = 79691776/99614720
>>>>>>>> 2011-04-28 15:29:18,266 INFO org.apache.hadoop.mapred.MapTask: record
>>>>>>>> buffer = 262144/327680
>>>>>>>> The driver log isn't very interesting either:
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient: Job complete:
>>>>>>>>>
>>>>>>>> job_201104221106_0128
>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient: Counters: 19
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:   Job Counters
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Launched reduce
>>>>>>>>> tasks=50
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Rack-local map tasks=8
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Launched map tasks=10
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Data-local map tasks=2
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:   FileSystemCounters
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:
>>>>>>>>> FILE_BYTES_READ=201387210
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:
>>>>>>>>> HDFS_BYTES_READ=855491954
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:
>>>>>>>>>
>>>>>>>> FILE_BYTES_WRITTEN=402789200
>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:
>>>>>>>> HDFS_BYTES_WRITTEN=20143593
>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:   Map-Reduce Framework
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce input groups=1
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Combine output
>>>>>>>>> records=10
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map input
>>>>>>>>>
>>>>>>>> records=12259359
>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce shuffle
>>>>>>>> bytes=181251087
>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce output records=1
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Spilled Records=20
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map output
>>>>>>>>>
>>>>>>>> bytes=201386850
>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map input
>>>>>>>>> bytes=654091926
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Combine input
>>>>>>>>> records=10
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map output records=10
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce input records=10
>>>>>>>>> 11/04/28 14:26:45 INFO lanczos.LanczosSolver: 9 passes through the
>>>>>>>>>
>>>>>>>> corpus so far...
>>>>>>>> 11/04/28 14:26:53 INFO lanczos.LanczosSolver: Lanczos iteration
>>>>>>>> complete - now to diagonalize the tri-diagonal auxiliary matrix.
>>>>>>>> 11/04/28 14:26:58 INFO lanczos.LanczosSolver: Eigenvector 0 found
>>>>>>>>> with
>>>>>>>>>
>>>>>>>> eigenvalue 0.0
>>>>>>>> 11/04/28 14:27:02 INFO lanczos.LanczosSolver: Eigenvector 1 found
>>>>>>>>> with
>>>>>>>>>
>>>>>>>> eigenvalue 2.7694996270467187
>>>>>>>> 11/04/28 14:27:06 INFO lanczos.LanczosSolver: Eigenvector 2 found
>>>>>>>>> with
>>>>>>>>>
>>>>>>>> eigenvalue 124.51350651732557
>>>>>>>> 11/04/28 14:27:10 INFO lanczos.LanczosSolver: Eigenvector 3 found
>>>>>>>>> with
>>>>>>>>>
>>>>>>>> eigenvalue 254.94694717051294
>>>>>>>> 11/04/28 14:27:15 INFO lanczos.LanczosSolver: Eigenvector 4 found
>>>>>>>>> with
>>>>>>>>>
>>>>>>>> eigenvalue 387.54002942185565
>>>>>>>> 11/04/28 14:27:19 INFO lanczos.LanczosSolver: Eigenvector 5 found
>>>>>>>>> with
>>>>>>>>>
>>>>>>>> eigenvalue 501.70725582316834
>>>>>>>> 11/04/28 14:27:23 INFO lanczos.LanczosSolver: Eigenvector 6 found
>>>>>>>>> with
>>>>>>>>>
>>>>>>>> eigenvalue 614.8114660750927
>>>>>>>> 11/04/28 14:27:28 INFO lanczos.LanczosSolver: Eigenvector 7 found
>>>>>>>>> with
>>>>>>>>>
>>>>>>>> eigenvalue 700.1694351007611
>>>>>>>> 11/04/28 14:27:32 INFO lanczos.LanczosSolver: Eigenvector 8 found
>>>>>>>>> with
>>>>>>>>>
>>>>>>>> eigenvalue 775.2520850269659
>>>>>>>> 11/04/28 14:27:36 INFO lanczos.LanczosSolver: Eigenvector 9 found
>>>>>>>>> with
>>>>>>>>>
>>>>>>>> eigenvalue 39441.64442962535
>>>>>>>> 11/04/28 14:27:36 INFO lanczos.LanczosSolver: LanczosSolver finished.
>>>>>>>>> 11/04/28 14:27:36 INFO decomposer.DistributedLanczosSolver:
>>>>>>>>> Persisting
>>>>>>>>>
>>>>>>>> 10 eigenVectors and eigenValues to: ./svd/rawEigenvectors
>>>>>>>> 11/04/28 14:29:11 INFO mapred.FileInputFormat: Total input paths to
>>>>>>>> process : 1
>>>>>>>> 11/04/28 14:29:12 INFO mapred.JobClient: Running job:
>>>>>>>> job_201104221106_0129
>>>>>>>> 11/04/28 14:29:13 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>>>>> 11/04/28 16:08:23 INFO mapred.JobClient:  map 1% reduce 0%
>>>>>>>>> ...
>>>>>>>>> 11/04/29 05:32:24 INFO mapred.JobClient:  map 9% reduce 0%
>>>>>>>>> 11/04/29 07:18:38 INFO mapred.JobClient:  map 10% reduce 0%
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The counters don't mean much to me either. Here's one set from an
>>>>>>>>>
>>>>>>>> incomplete mapper:
>>>>>>>> *FileSystemCounters*
>>>>>>>>>     HDFS_BYTES_READ     24,540,497
>>>>>>>>>
>>>>>>>>> *Map-Reduce Framework*
>>>>>>>>>
>>>>>>>>>     Combine output records     0
>>>>>>>>>
>>>>>>>>>     Map input records     115,815
>>>>>>>>>
>>>>>>>>>     Spilled Records     0
>>>>>>>>>
>>>>>>>>>     Map output bytes     0
>>>>>>>>>
>>>>>>>>>     Map input bytes     4,401,563
>>>>>>>>>
>>>>>>>>>     Combine input records     0
>>>>>>>>>
>>>>>>>>>     Map output records     0
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 04/29/2011 10:28 AM, Jake Mannix wrote:
>>>>>>>>>
>>>>>>>>>> On Fri, Apr 29, 2011 at 9:45 AM, Paul Mahon<pm...@decarta.com>
>>>>>>>>>>
>>>>>>>>>   wrote:
>>>>>>>>   I'm running the LanczosSolver (from SVN two days ago) it it's taking
>>>>>>>>>> (what
>>>>>>>>   I think) is a very long time to run. The matrix I'm using is very
>>>>>>>>>> sparse.
>>>>>>>>   The command I used was:
>>>>>>>>>>> bin/mahour svd --input ./base --output ./svd --numRows 12259359
>>>>>>>>>>>
>>>>>>>>>> --numCols
>>>>>>>>   2517335 --rank 10 --cleansvd true -Dmapred.child.java.opts=-Xmx2048m
>>>>>>>>>>> The first part of the program runs pretty quickly, and in less
>>>>>>>>>>> than
>>>>>>>>>>>
>>>>>>>>>> 10
>>>>>>>>   minutes it has found 10 eigenvectors. The second step (cleansvd) has
>>>>>>>>>> been
>>>>>>>>   running for 15 hours and is only 10% of the way through the map
>>>>>>>>>> tasks.
>>>>>>>> 10 minutes for 10 eigenvectors is totally reasonable, depending on
>>>>>>>>> how
>>>>>>>> sparse this matrix is.  cleansvd should take roughly the *same*
>>>>>>>>> amount of
>>>>>>>> time as the original solving, because it's basically just taking each
>>>>>>>>> of the
>>>>>>>> vectors and re-multiplying them by the original matrix to see how
>>>>>>>>> close to
>>>>>>>> actual eigenvectors they are.  Do you have any log output to share?
>>>>>>>>>   I can
>>>>>>>> try to reproduce this on a similarly sized matrix, but this defintely
>>>>>>>>> seems
>>>>>>>> wrong.
>>>>>>>>>>
>>>>>>>>>>   Is this performance expected? Is there something I can do to speed
>>>>>>>>>> it up?
>>>>>>>> No, and probably not.  Other than help me find the bug that's causing
>>>>>>>>> this.
>>>>>>>>   I've never seen that behavior before.
>>>>>>>>>>    -jake
>>>>>>>>>>
>>>>>>>>>>   --------------------------
>>>>>>> Ken Krugler
>>>>>>> +1 530-210-6378
>>>>>>> http://bixolabs.com
>>>>>>> e l a s t i c   w e b   m i n i n g
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>

Re: LanczosSolver Very Slow

Posted by Jake Mannix <ja...@gmail.com>.
Hellz yes.

On Apr 30, 2011 6:18 PM, "Lance Norskog" <go...@gmail.com> wrote:

And are we going to stop using 'instanceof'?

On Sat, Apr 30, 2011 at 3:34 PM, Paul Mahon <pm...@decarta.com> wrote: >
Yep, I can confirm, it to...
--
Lance Norskog
goksron@gmail.com

Re: LanczosSolver Very Slow

Posted by Lance Norskog <go...@gmail.com>.
And are we going to stop using 'instanceof'?

On Sat, Apr 30, 2011 at 3:34 PM, Paul Mahon <pm...@decarta.com> wrote:
> Yep, I can confirm, it took me about 23 minutes to run it this time.
>
> On 04/30/2011 12:24 PM, Jake Mannix wrote:
>>
>> Paul, I ran your data this morning (on my laptop, pseudo distributed mode)
>>
>> INFO: Program took 1388093 ms
>>
>> Yay!
>>
>>   -jake
>>
>> On Fri, Apr 29, 2011 at 10:24 PM, Jake Mannix<ja...@gmail.com>
>>  wrote:
>>
>>> Great, let me know what you see.  I just checked in the last cleanup
>>> stuff.
>>>
>>>
>>> On Fri, Apr 29, 2011 at 9:55 PM, Paul Mahon<pm...@decarta.com>  wrote:
>>>
>>>> That's for the quick work tracking this down. I'll give it another try
>>>> and
>>>> see if it's faster now.
>>>>
>>>>
>>>> On 04/29/2011 09:46 PM, Jake Mannix wrote:
>>>>
>>>>> The commit I made a few minutes ago resolving MAHOUT-639 most likely
>>>>> has
>>>>> fixed this, but there are
>>>>> a few cleanup issues left coming shortly (instanceof DenseVector /
>>>>> instanceof
>>>>> SequentialAccessSparseVector, at places where isDense() should be used,
>>>>> in
>>>>> case of NamedVector
>>>>> showing up), which avoid too many extra vector copies.
>>>>>
>>>>> Paul, try trunk again?
>>>>>
>>>>>   -jake
>>>>>
>>>>> On Fri, Apr 29, 2011 at 2:32 PM, Jake Mannix<ja...@gmail.com>
>>>>>  wrote:
>>>>>
>>>>>  Ok, I think I've found it.  It was this issue, and another similar
>>>>> one.
>>>>>>
>>>>>>  Going to verify in a bit, and patch
>>>>>> coming later this afternoon!
>>>>>>
>>>>>>   -jake
>>>>>>
>>>>>>
>>>>>> On Fri, Apr 29, 2011 at 1:42 PM, Jake Mannix<jake.mannix@gmail.com
>>>>>>>
>>>>>>> wrote:
>>>>>>
>>>>>>  I think this is actually a bug (or series of bugs) somewhere, not a
>>>>>>>
>>>>>>> configuration issue, as shown by the fact
>>>>>>> that LanczosSolver finishes just fine in 10 min.
>>>>>>>
>>>>>>> I'm looking into it.  I've already found one problem (TimesSquaredJob
>>>>>>> checks instanceOf DenseVector instead
>>>>>>> of calling isDense(), which gets the wrong answer for NamedVector
>>>>>>> instances), seeing if this is the slowdown
>>>>>>> cause now.
>>>>>>>
>>>>>>>   -jake
>>>>>>>
>>>>>>> On Fri, Apr 29, 2011 at 1:08 PM, Ken Krugler<
>>>>>>> kkrugler_lists@transpac.com
>>>>>>>
>>>>>>>> wrote:
>>>>>>>> Hi Paul,
>>>>>>>>
>>>>>>>> One random guess here - the total in-use memory for all tasks means
>>>>>>>> you've run out of physical memory, and are using swap.
>>>>>>>>
>>>>>>>> E.g. the use of JVM memory by hash maps is one way this can happen.
>>>>>>>>
>>>>>>>> Once that occurs, your performance will become effectively 0, as
>>>>>>>> tasks
>>>>>>>> compete with each other for memory.
>>>>>>>>
>>>>>>>> To see if this is happening, run 'top' on a slave and see how much
>>>>>>>> swap
>>>>>>>> is being used.
>>>>>>>>
>>>>>>>> -- Ken
>>>>>>>>
>>>>>>>> On Apr 29, 2011, at 11:49am, Paul Mahon wrote:
>>>>>>>>
>>>>>>>>  I did dump the stack a few times. Here are a couple which came up
>>>>>>>>>
>>>>>>>>> most
>>>>>>>>>
>>>>>>>> often:
>>>>>>>>
>>>>>>>>> "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
>>>>>>>>>   java.lang.Thread.State: RUNNABLE
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.map.OpenIntDoubleHashMap.indexOfKey(OpenIntDoubleHashMap.java:285)
>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.map.OpenIntDoubleHashMap.get(OpenIntDoubleHashMap.java:202)
>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.RandomAccessSparseVector.getQuick(RandomAccessSparseVector.java:129)
>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.AbstractVector.assign(AbstractVector.java:527)
>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:236)
>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
>>>>>>>>
>>>>>>>>>    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>>>>>>>>    at
>>>>>>>>> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>>>>>>>>>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>>>>>>>>    at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>>>>>>
>>>>>>>>> "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
>>>>>>>>>   java.lang.Thread.State: RUNNABLE
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.AbstractVector.assign(AbstractVector.java:526)
>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:236)
>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
>>>>>>>>
>>>>>>>>>    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>>>>>>>>    at
>>>>>>>>> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>>>>>>>>>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>>>>>>>>    at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>>>>>>
>>>>>>>>> "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
>>>>>>>>>   java.lang.Thread.State: RUNNABLE
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.RandomAccessSparseVector.dot(RandomAccessSparseVector.java:192)
>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.scale(TimesSquaredJob.java:241)
>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:232)
>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
>>>>>>>>
>>>>>>>>>    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>>>>>>>>    at
>>>>>>>>> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>>>>>>>>>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>>>>>>>>    at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 04/29/2011 10:42 AM, Paul Mahon wrote:
>>>>>>>>>
>>>>>>>>>> The task logs don't have anything useable:
>>>>>>>>>> *_syslog logs_*
>>>>>>>>>>
>>>>>>>>>> 2011-04-28 15:29:17,922 INFO
>>>>>>>>>> org.apache.hadoop.metrics.jvm.JvmMetrics:
>>>>>>>>>>
>>>>>>>>> Initializing JVM Metrics with processName=MAP, sessionId=
>>>>>>>>> 2011-04-28 15:29:18,114 INFO org.apache.hadoop.mapred.MapTask:
>>>>>>>>> numReduceTasks: 50
>>>>>>>>> 2011-04-28 15:29:18,124 INFO org.apache.hadoop.mapred.MapTask:
>>>>>>>>> io.sort.mb = 100
>>>>>>>>> 2011-04-28 15:29:18,266 INFO org.apache.hadoop.mapred.MapTask: data
>>>>>>>>> buffer = 79691776/99614720
>>>>>>>>> 2011-04-28 15:29:18,266 INFO org.apache.hadoop.mapred.MapTask:
>>>>>>>>> record
>>>>>>>>> buffer = 262144/327680
>>>>>>>>> The driver log isn't very interesting either:
>>>>>>>>>>
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient: Job complete:
>>>>>>>>>>
>>>>>>>>> job_201104221106_0128
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient: Counters: 19
>>>>>>>>>>
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:   Job Counters
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Launched reduce
>>>>>>>>>> tasks=50
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Rack-local map
>>>>>>>>>> tasks=8
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Launched map tasks=10
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Data-local map
>>>>>>>>>> tasks=2
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:   FileSystemCounters
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:
>>>>>>>>>> FILE_BYTES_READ=201387210
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:
>>>>>>>>>> HDFS_BYTES_READ=855491954
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:
>>>>>>>>>>
>>>>>>>>> FILE_BYTES_WRITTEN=402789200
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:
>>>>>>>>> HDFS_BYTES_WRITTEN=20143593
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:   Map-Reduce Framework
>>>>>>>>>>
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce input groups=1
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Combine output
>>>>>>>>>> records=10
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map input
>>>>>>>>>>
>>>>>>>>> records=12259359
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce shuffle
>>>>>>>>> bytes=181251087
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce output
>>>>>>>>> records=1
>>>>>>>>>>
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Spilled Records=20
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map output
>>>>>>>>>>
>>>>>>>>> bytes=201386850
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map input
>>>>>>>>>>
>>>>>>>>>> bytes=654091926
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Combine input
>>>>>>>>>> records=10
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map output records=10
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce input
>>>>>>>>>> records=10
>>>>>>>>>> 11/04/28 14:26:45 INFO lanczos.LanczosSolver: 9 passes through the
>>>>>>>>>>
>>>>>>>>> corpus so far...
>>>>>>>>> 11/04/28 14:26:53 INFO lanczos.LanczosSolver: Lanczos iteration
>>>>>>>>> complete - now to diagonalize the tri-diagonal auxiliary matrix.
>>>>>>>>> 11/04/28 14:26:58 INFO lanczos.LanczosSolver: Eigenvector 0 found
>>>>>>>>>>
>>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>> eigenvalue 0.0
>>>>>>>>> 11/04/28 14:27:02 INFO lanczos.LanczosSolver: Eigenvector 1 found
>>>>>>>>>>
>>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>> eigenvalue 2.7694996270467187
>>>>>>>>> 11/04/28 14:27:06 INFO lanczos.LanczosSolver: Eigenvector 2 found
>>>>>>>>>>
>>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>> eigenvalue 124.51350651732557
>>>>>>>>> 11/04/28 14:27:10 INFO lanczos.LanczosSolver: Eigenvector 3 found
>>>>>>>>>>
>>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>> eigenvalue 254.94694717051294
>>>>>>>>> 11/04/28 14:27:15 INFO lanczos.LanczosSolver: Eigenvector 4 found
>>>>>>>>>>
>>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>> eigenvalue 387.54002942185565
>>>>>>>>> 11/04/28 14:27:19 INFO lanczos.LanczosSolver: Eigenvector 5 found
>>>>>>>>>>
>>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>> eigenvalue 501.70725582316834
>>>>>>>>> 11/04/28 14:27:23 INFO lanczos.LanczosSolver: Eigenvector 6 found
>>>>>>>>>>
>>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>> eigenvalue 614.8114660750927
>>>>>>>>> 11/04/28 14:27:28 INFO lanczos.LanczosSolver: Eigenvector 7 found
>>>>>>>>>>
>>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>> eigenvalue 700.1694351007611
>>>>>>>>> 11/04/28 14:27:32 INFO lanczos.LanczosSolver: Eigenvector 8 found
>>>>>>>>>>
>>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>> eigenvalue 775.2520850269659
>>>>>>>>> 11/04/28 14:27:36 INFO lanczos.LanczosSolver: Eigenvector 9 found
>>>>>>>>>>
>>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>> eigenvalue 39441.64442962535
>>>>>>>>> 11/04/28 14:27:36 INFO lanczos.LanczosSolver: LanczosSolver
>>>>>>>>> finished.
>>>>>>>>>>
>>>>>>>>>> 11/04/28 14:27:36 INFO decomposer.DistributedLanczosSolver:
>>>>>>>>>> Persisting
>>>>>>>>>>
>>>>>>>>> 10 eigenVectors and eigenValues to: ./svd/rawEigenvectors
>>>>>>>>> 11/04/28 14:29:11 INFO mapred.FileInputFormat: Total input paths to
>>>>>>>>> process : 1
>>>>>>>>> 11/04/28 14:29:12 INFO mapred.JobClient: Running job:
>>>>>>>>> job_201104221106_0129
>>>>>>>>> 11/04/28 14:29:13 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>>>>>>
>>>>>>>>>> 11/04/28 16:08:23 INFO mapred.JobClient:  map 1% reduce 0%
>>>>>>>>>> ...
>>>>>>>>>> 11/04/29 05:32:24 INFO mapred.JobClient:  map 9% reduce 0%
>>>>>>>>>> 11/04/29 07:18:38 INFO mapred.JobClient:  map 10% reduce 0%
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The counters don't mean much to me either. Here's one set from an
>>>>>>>>>>
>>>>>>>>> incomplete mapper:
>>>>>>>>> *FileSystemCounters*
>>>>>>>>>>
>>>>>>>>>>    HDFS_BYTES_READ     24,540,497
>>>>>>>>>>
>>>>>>>>>> *Map-Reduce Framework*
>>>>>>>>>>
>>>>>>>>>>    Combine output records     0
>>>>>>>>>>
>>>>>>>>>>    Map input records     115,815
>>>>>>>>>>
>>>>>>>>>>    Spilled Records     0
>>>>>>>>>>
>>>>>>>>>>    Map output bytes     0
>>>>>>>>>>
>>>>>>>>>>    Map input bytes     4,401,563
>>>>>>>>>>
>>>>>>>>>>    Combine input records     0
>>>>>>>>>>
>>>>>>>>>>    Map output records     0
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 04/29/2011 10:28 AM, Jake Mannix wrote:
>>>>>>>>>>
>>>>>>>>>>> On Fri, Apr 29, 2011 at 9:45 AM, Paul Mahon<pm...@decarta.com>
>>>>>>>>>>>
>>>>>>>>>>  wrote:
>>>>>>>>>
>>>>>>>>>  I'm running the LanczosSolver (from SVN two days ago) it it's
>>>>>>>>> taking
>>>>>>>>>>>
>>>>>>>>>>> (what
>>>>>>>>>
>>>>>>>>>  I think) is a very long time to run. The matrix I'm using is very
>>>>>>>>>>>
>>>>>>>>>>> sparse.
>>>>>>>>>
>>>>>>>>>  The command I used was:
>>>>>>>>>>>>
>>>>>>>>>>>> bin/mahour svd --input ./base --output ./svd --numRows 12259359
>>>>>>>>>>>>
>>>>>>>>>>> --numCols
>>>>>>>>>
>>>>>>>>>  2517335 --rank 10 --cleansvd true
>>>>>>>>> -Dmapred.child.java.opts=-Xmx2048m
>>>>>>>>>>>>
>>>>>>>>>>>> The first part of the program runs pretty quickly, and in less
>>>>>>>>>>>> than
>>>>>>>>>>>>
>>>>>>>>>>> 10
>>>>>>>>>
>>>>>>>>>  minutes it has found 10 eigenvectors. The second step (cleansvd)
>>>>>>>>> has
>>>>>>>>>>>
>>>>>>>>>>> been
>>>>>>>>>
>>>>>>>>>  running for 15 hours and is only 10% of the way through the map
>>>>>>>>>>>
>>>>>>>>>>> tasks.
>>>>>>>>>
>>>>>>>>> 10 minutes for 10 eigenvectors is totally reasonable, depending on
>>>>>>>>>>
>>>>>>>>>> how
>>>>>>>>>
>>>>>>>>> sparse this matrix is.  cleansvd should take roughly the *same*
>>>>>>>>>>
>>>>>>>>>> amount of
>>>>>>>>>
>>>>>>>>> time as the original solving, because it's basically just taking
>>>>>>>>> each
>>>>>>>>>>
>>>>>>>>>> of the
>>>>>>>>>
>>>>>>>>> vectors and re-multiplying them by the original matrix to see how
>>>>>>>>>>
>>>>>>>>>> close to
>>>>>>>>>
>>>>>>>>> actual eigenvectors they are.  Do you have any log output to share?
>>>>>>>>>>
>>>>>>>>>>  I can
>>>>>>>>>
>>>>>>>>> try to reproduce this on a similarly sized matrix, but this
>>>>>>>>> defintely
>>>>>>>>>>
>>>>>>>>>> seems
>>>>>>>>>
>>>>>>>>> wrong.
>>>>>>>>>>>
>>>>>>>>>>>  Is this performance expected? Is there something I can do to
>>>>>>>>>>> speed
>>>>>>>>>>> it up?
>>>>>>>>>
>>>>>>>>> No, and probably not.  Other than help me find the bug that's
>>>>>>>>> causing
>>>>>>>>>>
>>>>>>>>>> this.
>>>>>>>>>
>>>>>>>>>  I've never seen that behavior before.
>>>>>>>>>>>
>>>>>>>>>>>   -jake
>>>>>>>>>>>
>>>>>>>>>>>  --------------------------
>>>>>>>>
>>>>>>>> Ken Krugler
>>>>>>>> +1 530-210-6378
>>>>>>>> http://bixolabs.com
>>>>>>>> e l a s t i c   w e b   m i n i n g
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>



-- 
Lance Norskog
goksron@gmail.com