You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Paul Mahon <pm...@decarta.com> on 2011/04/29 18:45:32 UTC

LanczosSolver Very Slow

I'm running the LanczosSolver (from SVN two days ago) it it's taking 
(what I think) is a very long time to run. The matrix I'm using is 
very sparse. The command I used was:

bin/mahour svd --input ./base --output ./svd --numRows 12259359 --numCols 2517335 --rank 10 --cleansvd true -Dmapred.child.java.opts=-Xmx2048m

The first part of the program runs pretty quickly, and in less than 10 
minutes it has found 10 eigenvectors. The second step (cleansvd) has 
been running for 15 hours and is only 10% of the way through the map 
tasks.

Is this performance expected? Is there something I can do to speed it up?

Re: LanczosSolver Very Slow

Posted by Jake Mannix <ja...@gmail.com>.
Hellz yes.

On Apr 30, 2011 6:18 PM, "Lance Norskog" <go...@gmail.com> wrote:

And are we going to stop using 'instanceof'?

On Sat, Apr 30, 2011 at 3:34 PM, Paul Mahon <pm...@decarta.com> wrote: >
Yep, I can confirm, it to...
--
Lance Norskog
goksron@gmail.com

Re: LanczosSolver Very Slow

Posted by Lance Norskog <go...@gmail.com>.
And are we going to stop using 'instanceof'?

On Sat, Apr 30, 2011 at 3:34 PM, Paul Mahon <pm...@decarta.com> wrote:
> Yep, I can confirm, it took me about 23 minutes to run it this time.
>
> On 04/30/2011 12:24 PM, Jake Mannix wrote:
>>
>> Paul, I ran your data this morning (on my laptop, pseudo distributed mode)
>>
>> INFO: Program took 1388093 ms
>>
>> Yay!
>>
>>   -jake
>>
>> On Fri, Apr 29, 2011 at 10:24 PM, Jake Mannix<ja...@gmail.com>
>>  wrote:
>>
>>> Great, let me know what you see.  I just checked in the last cleanup
>>> stuff.
>>>
>>>
>>> On Fri, Apr 29, 2011 at 9:55 PM, Paul Mahon<pm...@decarta.com>  wrote:
>>>
>>>> That's for the quick work tracking this down. I'll give it another try
>>>> and
>>>> see if it's faster now.
>>>>
>>>>
>>>> On 04/29/2011 09:46 PM, Jake Mannix wrote:
>>>>
>>>>> The commit I made a few minutes ago resolving MAHOUT-639 most likely
>>>>> has
>>>>> fixed this, but there are
>>>>> a few cleanup issues left coming shortly (instanceof DenseVector /
>>>>> instanceof
>>>>> SequentialAccessSparseVector, at places where isDense() should be used,
>>>>> in
>>>>> case of NamedVector
>>>>> showing up), which avoid too many extra vector copies.
>>>>>
>>>>> Paul, try trunk again?
>>>>>
>>>>>   -jake
>>>>>
>>>>> On Fri, Apr 29, 2011 at 2:32 PM, Jake Mannix<ja...@gmail.com>
>>>>>  wrote:
>>>>>
>>>>>  Ok, I think I've found it.  It was this issue, and another similar
>>>>> one.
>>>>>>
>>>>>>  Going to verify in a bit, and patch
>>>>>> coming later this afternoon!
>>>>>>
>>>>>>   -jake
>>>>>>
>>>>>>
>>>>>> On Fri, Apr 29, 2011 at 1:42 PM, Jake Mannix<jake.mannix@gmail.com
>>>>>>>
>>>>>>> wrote:
>>>>>>
>>>>>>  I think this is actually a bug (or series of bugs) somewhere, not a
>>>>>>>
>>>>>>> configuration issue, as shown by the fact
>>>>>>> that LanczosSolver finishes just fine in 10 min.
>>>>>>>
>>>>>>> I'm looking into it.  I've already found one problem (TimesSquaredJob
>>>>>>> checks instanceOf DenseVector instead
>>>>>>> of calling isDense(), which gets the wrong answer for NamedVector
>>>>>>> instances), seeing if this is the slowdown
>>>>>>> cause now.
>>>>>>>
>>>>>>>   -jake
>>>>>>>
>>>>>>> On Fri, Apr 29, 2011 at 1:08 PM, Ken Krugler<
>>>>>>> kkrugler_lists@transpac.com
>>>>>>>
>>>>>>>> wrote:
>>>>>>>> Hi Paul,
>>>>>>>>
>>>>>>>> One random guess here - the total in-use memory for all tasks means
>>>>>>>> you've run out of physical memory, and are using swap.
>>>>>>>>
>>>>>>>> E.g. the use of JVM memory by hash maps is one way this can happen.
>>>>>>>>
>>>>>>>> Once that occurs, your performance will become effectively 0, as
>>>>>>>> tasks
>>>>>>>> compete with each other for memory.
>>>>>>>>
>>>>>>>> To see if this is happening, run 'top' on a slave and see how much
>>>>>>>> swap
>>>>>>>> is being used.
>>>>>>>>
>>>>>>>> -- Ken
>>>>>>>>
>>>>>>>> On Apr 29, 2011, at 11:49am, Paul Mahon wrote:
>>>>>>>>
>>>>>>>>  I did dump the stack a few times. Here are a couple which came up
>>>>>>>>>
>>>>>>>>> most
>>>>>>>>>
>>>>>>>> often:
>>>>>>>>
>>>>>>>>> "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
>>>>>>>>>   java.lang.Thread.State: RUNNABLE
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.map.OpenIntDoubleHashMap.indexOfKey(OpenIntDoubleHashMap.java:285)
>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.map.OpenIntDoubleHashMap.get(OpenIntDoubleHashMap.java:202)
>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.RandomAccessSparseVector.getQuick(RandomAccessSparseVector.java:129)
>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.AbstractVector.assign(AbstractVector.java:527)
>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:236)
>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
>>>>>>>>
>>>>>>>>>    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>>>>>>>>    at
>>>>>>>>> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>>>>>>>>>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>>>>>>>>    at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>>>>>>
>>>>>>>>> "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
>>>>>>>>>   java.lang.Thread.State: RUNNABLE
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.AbstractVector.assign(AbstractVector.java:526)
>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:236)
>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
>>>>>>>>
>>>>>>>>>    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>>>>>>>>    at
>>>>>>>>> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>>>>>>>>>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>>>>>>>>    at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>>>>>>
>>>>>>>>> "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
>>>>>>>>>   java.lang.Thread.State: RUNNABLE
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.RandomAccessSparseVector.dot(RandomAccessSparseVector.java:192)
>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.scale(TimesSquaredJob.java:241)
>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:232)
>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
>>>>>>>>
>>>>>>>>>    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>>>>>>>>    at
>>>>>>>>> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>>>>>>>>>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>>>>>>>>    at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 04/29/2011 10:42 AM, Paul Mahon wrote:
>>>>>>>>>
>>>>>>>>>> The task logs don't have anything useable:
>>>>>>>>>> *_syslog logs_*
>>>>>>>>>>
>>>>>>>>>> 2011-04-28 15:29:17,922 INFO
>>>>>>>>>> org.apache.hadoop.metrics.jvm.JvmMetrics:
>>>>>>>>>>
>>>>>>>>> Initializing JVM Metrics with processName=MAP, sessionId=
>>>>>>>>> 2011-04-28 15:29:18,114 INFO org.apache.hadoop.mapred.MapTask:
>>>>>>>>> numReduceTasks: 50
>>>>>>>>> 2011-04-28 15:29:18,124 INFO org.apache.hadoop.mapred.MapTask:
>>>>>>>>> io.sort.mb = 100
>>>>>>>>> 2011-04-28 15:29:18,266 INFO org.apache.hadoop.mapred.MapTask: data
>>>>>>>>> buffer = 79691776/99614720
>>>>>>>>> 2011-04-28 15:29:18,266 INFO org.apache.hadoop.mapred.MapTask:
>>>>>>>>> record
>>>>>>>>> buffer = 262144/327680
>>>>>>>>> The driver log isn't very interesting either:
>>>>>>>>>>
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient: Job complete:
>>>>>>>>>>
>>>>>>>>> job_201104221106_0128
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient: Counters: 19
>>>>>>>>>>
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:   Job Counters
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Launched reduce
>>>>>>>>>> tasks=50
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Rack-local map
>>>>>>>>>> tasks=8
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Launched map tasks=10
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Data-local map
>>>>>>>>>> tasks=2
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:   FileSystemCounters
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:
>>>>>>>>>> FILE_BYTES_READ=201387210
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:
>>>>>>>>>> HDFS_BYTES_READ=855491954
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:
>>>>>>>>>>
>>>>>>>>> FILE_BYTES_WRITTEN=402789200
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:
>>>>>>>>> HDFS_BYTES_WRITTEN=20143593
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:   Map-Reduce Framework
>>>>>>>>>>
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce input groups=1
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Combine output
>>>>>>>>>> records=10
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map input
>>>>>>>>>>
>>>>>>>>> records=12259359
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce shuffle
>>>>>>>>> bytes=181251087
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce output
>>>>>>>>> records=1
>>>>>>>>>>
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Spilled Records=20
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map output
>>>>>>>>>>
>>>>>>>>> bytes=201386850
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map input
>>>>>>>>>>
>>>>>>>>>> bytes=654091926
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Combine input
>>>>>>>>>> records=10
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map output records=10
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce input
>>>>>>>>>> records=10
>>>>>>>>>> 11/04/28 14:26:45 INFO lanczos.LanczosSolver: 9 passes through the
>>>>>>>>>>
>>>>>>>>> corpus so far...
>>>>>>>>> 11/04/28 14:26:53 INFO lanczos.LanczosSolver: Lanczos iteration
>>>>>>>>> complete - now to diagonalize the tri-diagonal auxiliary matrix.
>>>>>>>>> 11/04/28 14:26:58 INFO lanczos.LanczosSolver: Eigenvector 0 found
>>>>>>>>>>
>>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>> eigenvalue 0.0
>>>>>>>>> 11/04/28 14:27:02 INFO lanczos.LanczosSolver: Eigenvector 1 found
>>>>>>>>>>
>>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>> eigenvalue 2.7694996270467187
>>>>>>>>> 11/04/28 14:27:06 INFO lanczos.LanczosSolver: Eigenvector 2 found
>>>>>>>>>>
>>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>> eigenvalue 124.51350651732557
>>>>>>>>> 11/04/28 14:27:10 INFO lanczos.LanczosSolver: Eigenvector 3 found
>>>>>>>>>>
>>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>> eigenvalue 254.94694717051294
>>>>>>>>> 11/04/28 14:27:15 INFO lanczos.LanczosSolver: Eigenvector 4 found
>>>>>>>>>>
>>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>> eigenvalue 387.54002942185565
>>>>>>>>> 11/04/28 14:27:19 INFO lanczos.LanczosSolver: Eigenvector 5 found
>>>>>>>>>>
>>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>> eigenvalue 501.70725582316834
>>>>>>>>> 11/04/28 14:27:23 INFO lanczos.LanczosSolver: Eigenvector 6 found
>>>>>>>>>>
>>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>> eigenvalue 614.8114660750927
>>>>>>>>> 11/04/28 14:27:28 INFO lanczos.LanczosSolver: Eigenvector 7 found
>>>>>>>>>>
>>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>> eigenvalue 700.1694351007611
>>>>>>>>> 11/04/28 14:27:32 INFO lanczos.LanczosSolver: Eigenvector 8 found
>>>>>>>>>>
>>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>> eigenvalue 775.2520850269659
>>>>>>>>> 11/04/28 14:27:36 INFO lanczos.LanczosSolver: Eigenvector 9 found
>>>>>>>>>>
>>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>> eigenvalue 39441.64442962535
>>>>>>>>> 11/04/28 14:27:36 INFO lanczos.LanczosSolver: LanczosSolver
>>>>>>>>> finished.
>>>>>>>>>>
>>>>>>>>>> 11/04/28 14:27:36 INFO decomposer.DistributedLanczosSolver:
>>>>>>>>>> Persisting
>>>>>>>>>>
>>>>>>>>> 10 eigenVectors and eigenValues to: ./svd/rawEigenvectors
>>>>>>>>> 11/04/28 14:29:11 INFO mapred.FileInputFormat: Total input paths to
>>>>>>>>> process : 1
>>>>>>>>> 11/04/28 14:29:12 INFO mapred.JobClient: Running job:
>>>>>>>>> job_201104221106_0129
>>>>>>>>> 11/04/28 14:29:13 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>>>>>>
>>>>>>>>>> 11/04/28 16:08:23 INFO mapred.JobClient:  map 1% reduce 0%
>>>>>>>>>> ...
>>>>>>>>>> 11/04/29 05:32:24 INFO mapred.JobClient:  map 9% reduce 0%
>>>>>>>>>> 11/04/29 07:18:38 INFO mapred.JobClient:  map 10% reduce 0%
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The counters don't mean much to me either. Here's one set from an
>>>>>>>>>>
>>>>>>>>> incomplete mapper:
>>>>>>>>> *FileSystemCounters*
>>>>>>>>>>
>>>>>>>>>>    HDFS_BYTES_READ     24,540,497
>>>>>>>>>>
>>>>>>>>>> *Map-Reduce Framework*
>>>>>>>>>>
>>>>>>>>>>    Combine output records     0
>>>>>>>>>>
>>>>>>>>>>    Map input records     115,815
>>>>>>>>>>
>>>>>>>>>>    Spilled Records     0
>>>>>>>>>>
>>>>>>>>>>    Map output bytes     0
>>>>>>>>>>
>>>>>>>>>>    Map input bytes     4,401,563
>>>>>>>>>>
>>>>>>>>>>    Combine input records     0
>>>>>>>>>>
>>>>>>>>>>    Map output records     0
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 04/29/2011 10:28 AM, Jake Mannix wrote:
>>>>>>>>>>
>>>>>>>>>>> On Fri, Apr 29, 2011 at 9:45 AM, Paul Mahon<pm...@decarta.com>
>>>>>>>>>>>
>>>>>>>>>>  wrote:
>>>>>>>>>
>>>>>>>>>  I'm running the LanczosSolver (from SVN two days ago) it it's
>>>>>>>>> taking
>>>>>>>>>>>
>>>>>>>>>>> (what
>>>>>>>>>
>>>>>>>>>  I think) is a very long time to run. The matrix I'm using is very
>>>>>>>>>>>
>>>>>>>>>>> sparse.
>>>>>>>>>
>>>>>>>>>  The command I used was:
>>>>>>>>>>>>
>>>>>>>>>>>> bin/mahour svd --input ./base --output ./svd --numRows 12259359
>>>>>>>>>>>>
>>>>>>>>>>> --numCols
>>>>>>>>>
>>>>>>>>>  2517335 --rank 10 --cleansvd true
>>>>>>>>> -Dmapred.child.java.opts=-Xmx2048m
>>>>>>>>>>>>
>>>>>>>>>>>> The first part of the program runs pretty quickly, and in less
>>>>>>>>>>>> than
>>>>>>>>>>>>
>>>>>>>>>>> 10
>>>>>>>>>
>>>>>>>>>  minutes it has found 10 eigenvectors. The second step (cleansvd)
>>>>>>>>> has
>>>>>>>>>>>
>>>>>>>>>>> been
>>>>>>>>>
>>>>>>>>>  running for 15 hours and is only 10% of the way through the map
>>>>>>>>>>>
>>>>>>>>>>> tasks.
>>>>>>>>>
>>>>>>>>> 10 minutes for 10 eigenvectors is totally reasonable, depending on
>>>>>>>>>>
>>>>>>>>>> how
>>>>>>>>>
>>>>>>>>> sparse this matrix is.  cleansvd should take roughly the *same*
>>>>>>>>>>
>>>>>>>>>> amount of
>>>>>>>>>
>>>>>>>>> time as the original solving, because it's basically just taking
>>>>>>>>> each
>>>>>>>>>>
>>>>>>>>>> of the
>>>>>>>>>
>>>>>>>>> vectors and re-multiplying them by the original matrix to see how
>>>>>>>>>>
>>>>>>>>>> close to
>>>>>>>>>
>>>>>>>>> actual eigenvectors they are.  Do you have any log output to share?
>>>>>>>>>>
>>>>>>>>>>  I can
>>>>>>>>>
>>>>>>>>> try to reproduce this on a similarly sized matrix, but this
>>>>>>>>> defintely
>>>>>>>>>>
>>>>>>>>>> seems
>>>>>>>>>
>>>>>>>>> wrong.
>>>>>>>>>>>
>>>>>>>>>>>  Is this performance expected? Is there something I can do to
>>>>>>>>>>> speed
>>>>>>>>>>> it up?
>>>>>>>>>
>>>>>>>>> No, and probably not.  Other than help me find the bug that's
>>>>>>>>> causing
>>>>>>>>>>
>>>>>>>>>> this.
>>>>>>>>>
>>>>>>>>>  I've never seen that behavior before.
>>>>>>>>>>>
>>>>>>>>>>>   -jake
>>>>>>>>>>>
>>>>>>>>>>>  --------------------------
>>>>>>>>
>>>>>>>> Ken Krugler
>>>>>>>> +1 530-210-6378
>>>>>>>> http://bixolabs.com
>>>>>>>> e l a s t i c   w e b   m i n i n g
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>



-- 
Lance Norskog
goksron@gmail.com

Re: LanczosSolver Very Slow

Posted by Paul Mahon <pm...@decarta.com>.
Yep, I can confirm, it took me about 23 minutes to run it this time.

On 04/30/2011 12:24 PM, Jake Mannix wrote:
> Paul, I ran your data this morning (on my laptop, pseudo distributed mode)
>
> INFO: Program took 1388093 ms
>
> Yay!
>
>    -jake
>
> On Fri, Apr 29, 2011 at 10:24 PM, Jake Mannix<ja...@gmail.com>  wrote:
>
>> Great, let me know what you see.  I just checked in the last cleanup stuff.
>>
>>
>> On Fri, Apr 29, 2011 at 9:55 PM, Paul Mahon<pm...@decarta.com>  wrote:
>>
>>> That's for the quick work tracking this down. I'll give it another try and
>>> see if it's faster now.
>>>
>>>
>>> On 04/29/2011 09:46 PM, Jake Mannix wrote:
>>>
>>>> The commit I made a few minutes ago resolving MAHOUT-639 most likely has
>>>> fixed this, but there are
>>>> a few cleanup issues left coming shortly (instanceof DenseVector /
>>>> instanceof
>>>> SequentialAccessSparseVector, at places where isDense() should be used,
>>>> in
>>>> case of NamedVector
>>>> showing up), which avoid too many extra vector copies.
>>>>
>>>> Paul, try trunk again?
>>>>
>>>>    -jake
>>>>
>>>> On Fri, Apr 29, 2011 at 2:32 PM, Jake Mannix<ja...@gmail.com>
>>>>   wrote:
>>>>
>>>>   Ok, I think I've found it.  It was this issue, and another similar one.
>>>>>   Going to verify in a bit, and patch
>>>>> coming later this afternoon!
>>>>>
>>>>>    -jake
>>>>>
>>>>>
>>>>> On Fri, Apr 29, 2011 at 1:42 PM, Jake Mannix<jake.mannix@gmail.com
>>>>>> wrote:
>>>>>   I think this is actually a bug (or series of bugs) somewhere, not a
>>>>>> configuration issue, as shown by the fact
>>>>>> that LanczosSolver finishes just fine in 10 min.
>>>>>>
>>>>>> I'm looking into it.  I've already found one problem (TimesSquaredJob
>>>>>> checks instanceOf DenseVector instead
>>>>>> of calling isDense(), which gets the wrong answer for NamedVector
>>>>>> instances), seeing if this is the slowdown
>>>>>> cause now.
>>>>>>
>>>>>>    -jake
>>>>>>
>>>>>> On Fri, Apr 29, 2011 at 1:08 PM, Ken Krugler<
>>>>>> kkrugler_lists@transpac.com
>>>>>>
>>>>>>> wrote:
>>>>>>> Hi Paul,
>>>>>>>
>>>>>>> One random guess here - the total in-use memory for all tasks means
>>>>>>> you've run out of physical memory, and are using swap.
>>>>>>>
>>>>>>> E.g. the use of JVM memory by hash maps is one way this can happen.
>>>>>>>
>>>>>>> Once that occurs, your performance will become effectively 0, as tasks
>>>>>>> compete with each other for memory.
>>>>>>>
>>>>>>> To see if this is happening, run 'top' on a slave and see how much
>>>>>>> swap
>>>>>>> is being used.
>>>>>>>
>>>>>>> -- Ken
>>>>>>>
>>>>>>> On Apr 29, 2011, at 11:49am, Paul Mahon wrote:
>>>>>>>
>>>>>>>   I did dump the stack a few times. Here are a couple which came up
>>>>>>>> most
>>>>>>>>
>>>>>>> often:
>>>>>>>
>>>>>>>> "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
>>>>>>>>    java.lang.Thread.State: RUNNABLE
>>>>>>>>     at
>>>>>>>>
>>>>>>> org.apache.mahout.math.map.OpenIntDoubleHashMap.indexOfKey(OpenIntDoubleHashMap.java:285)
>>>>>>>
>>>>>>>>     at
>>>>>>>>
>>>>>>> org.apache.mahout.math.map.OpenIntDoubleHashMap.get(OpenIntDoubleHashMap.java:202)
>>>>>>>
>>>>>>>>     at
>>>>>>>>
>>>>>>> org.apache.mahout.math.RandomAccessSparseVector.getQuick(RandomAccessSparseVector.java:129)
>>>>>>>
>>>>>>>>     at
>>>>>>>>
>>>>>>> org.apache.mahout.math.AbstractVector.assign(AbstractVector.java:527)
>>>>>>>
>>>>>>>>     at
>>>>>>>>
>>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:236)
>>>>>>>
>>>>>>>>     at
>>>>>>>>
>>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
>>>>>>>
>>>>>>>>     at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>>>>>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>>>>>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>>>>>>>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>>>>>
>>>>>>>> "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
>>>>>>>>    java.lang.Thread.State: RUNNABLE
>>>>>>>>     at
>>>>>>>>
>>>>>>> org.apache.mahout.math.AbstractVector.assign(AbstractVector.java:526)
>>>>>>>
>>>>>>>>     at
>>>>>>>>
>>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:236)
>>>>>>>
>>>>>>>>     at
>>>>>>>>
>>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
>>>>>>>
>>>>>>>>     at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>>>>>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>>>>>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>>>>>>>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>>>>>
>>>>>>>> "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
>>>>>>>>    java.lang.Thread.State: RUNNABLE
>>>>>>>>     at
>>>>>>>>
>>>>>>> org.apache.mahout.math.RandomAccessSparseVector.dot(RandomAccessSparseVector.java:192)
>>>>>>>
>>>>>>>>     at
>>>>>>>>
>>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.scale(TimesSquaredJob.java:241)
>>>>>>>
>>>>>>>>     at
>>>>>>>>
>>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:232)
>>>>>>>
>>>>>>>>     at
>>>>>>>>
>>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
>>>>>>>
>>>>>>>>     at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>>>>>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>>>>>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>>>>>>>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 04/29/2011 10:42 AM, Paul Mahon wrote:
>>>>>>>>
>>>>>>>>> The task logs don't have anything useable:
>>>>>>>>> *_syslog logs_*
>>>>>>>>>
>>>>>>>>> 2011-04-28 15:29:17,922 INFO
>>>>>>>>> org.apache.hadoop.metrics.jvm.JvmMetrics:
>>>>>>>>>
>>>>>>>> Initializing JVM Metrics with processName=MAP, sessionId=
>>>>>>>> 2011-04-28 15:29:18,114 INFO org.apache.hadoop.mapred.MapTask:
>>>>>>>> numReduceTasks: 50
>>>>>>>> 2011-04-28 15:29:18,124 INFO org.apache.hadoop.mapred.MapTask:
>>>>>>>> io.sort.mb = 100
>>>>>>>> 2011-04-28 15:29:18,266 INFO org.apache.hadoop.mapred.MapTask: data
>>>>>>>> buffer = 79691776/99614720
>>>>>>>> 2011-04-28 15:29:18,266 INFO org.apache.hadoop.mapred.MapTask: record
>>>>>>>> buffer = 262144/327680
>>>>>>>> The driver log isn't very interesting either:
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient: Job complete:
>>>>>>>>>
>>>>>>>> job_201104221106_0128
>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient: Counters: 19
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:   Job Counters
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Launched reduce
>>>>>>>>> tasks=50
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Rack-local map tasks=8
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Launched map tasks=10
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Data-local map tasks=2
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:   FileSystemCounters
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:
>>>>>>>>> FILE_BYTES_READ=201387210
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:
>>>>>>>>> HDFS_BYTES_READ=855491954
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:
>>>>>>>>>
>>>>>>>> FILE_BYTES_WRITTEN=402789200
>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:
>>>>>>>> HDFS_BYTES_WRITTEN=20143593
>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:   Map-Reduce Framework
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce input groups=1
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Combine output
>>>>>>>>> records=10
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map input
>>>>>>>>>
>>>>>>>> records=12259359
>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce shuffle
>>>>>>>> bytes=181251087
>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce output records=1
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Spilled Records=20
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map output
>>>>>>>>>
>>>>>>>> bytes=201386850
>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map input
>>>>>>>>> bytes=654091926
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Combine input
>>>>>>>>> records=10
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map output records=10
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce input records=10
>>>>>>>>> 11/04/28 14:26:45 INFO lanczos.LanczosSolver: 9 passes through the
>>>>>>>>>
>>>>>>>> corpus so far...
>>>>>>>> 11/04/28 14:26:53 INFO lanczos.LanczosSolver: Lanczos iteration
>>>>>>>> complete - now to diagonalize the tri-diagonal auxiliary matrix.
>>>>>>>> 11/04/28 14:26:58 INFO lanczos.LanczosSolver: Eigenvector 0 found
>>>>>>>>> with
>>>>>>>>>
>>>>>>>> eigenvalue 0.0
>>>>>>>> 11/04/28 14:27:02 INFO lanczos.LanczosSolver: Eigenvector 1 found
>>>>>>>>> with
>>>>>>>>>
>>>>>>>> eigenvalue 2.7694996270467187
>>>>>>>> 11/04/28 14:27:06 INFO lanczos.LanczosSolver: Eigenvector 2 found
>>>>>>>>> with
>>>>>>>>>
>>>>>>>> eigenvalue 124.51350651732557
>>>>>>>> 11/04/28 14:27:10 INFO lanczos.LanczosSolver: Eigenvector 3 found
>>>>>>>>> with
>>>>>>>>>
>>>>>>>> eigenvalue 254.94694717051294
>>>>>>>> 11/04/28 14:27:15 INFO lanczos.LanczosSolver: Eigenvector 4 found
>>>>>>>>> with
>>>>>>>>>
>>>>>>>> eigenvalue 387.54002942185565
>>>>>>>> 11/04/28 14:27:19 INFO lanczos.LanczosSolver: Eigenvector 5 found
>>>>>>>>> with
>>>>>>>>>
>>>>>>>> eigenvalue 501.70725582316834
>>>>>>>> 11/04/28 14:27:23 INFO lanczos.LanczosSolver: Eigenvector 6 found
>>>>>>>>> with
>>>>>>>>>
>>>>>>>> eigenvalue 614.8114660750927
>>>>>>>> 11/04/28 14:27:28 INFO lanczos.LanczosSolver: Eigenvector 7 found
>>>>>>>>> with
>>>>>>>>>
>>>>>>>> eigenvalue 700.1694351007611
>>>>>>>> 11/04/28 14:27:32 INFO lanczos.LanczosSolver: Eigenvector 8 found
>>>>>>>>> with
>>>>>>>>>
>>>>>>>> eigenvalue 775.2520850269659
>>>>>>>> 11/04/28 14:27:36 INFO lanczos.LanczosSolver: Eigenvector 9 found
>>>>>>>>> with
>>>>>>>>>
>>>>>>>> eigenvalue 39441.64442962535
>>>>>>>> 11/04/28 14:27:36 INFO lanczos.LanczosSolver: LanczosSolver finished.
>>>>>>>>> 11/04/28 14:27:36 INFO decomposer.DistributedLanczosSolver:
>>>>>>>>> Persisting
>>>>>>>>>
>>>>>>>> 10 eigenVectors and eigenValues to: ./svd/rawEigenvectors
>>>>>>>> 11/04/28 14:29:11 INFO mapred.FileInputFormat: Total input paths to
>>>>>>>> process : 1
>>>>>>>> 11/04/28 14:29:12 INFO mapred.JobClient: Running job:
>>>>>>>> job_201104221106_0129
>>>>>>>> 11/04/28 14:29:13 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>>>>> 11/04/28 16:08:23 INFO mapred.JobClient:  map 1% reduce 0%
>>>>>>>>> ...
>>>>>>>>> 11/04/29 05:32:24 INFO mapred.JobClient:  map 9% reduce 0%
>>>>>>>>> 11/04/29 07:18:38 INFO mapred.JobClient:  map 10% reduce 0%
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The counters don't mean much to me either. Here's one set from an
>>>>>>>>>
>>>>>>>> incomplete mapper:
>>>>>>>> *FileSystemCounters*
>>>>>>>>>     HDFS_BYTES_READ     24,540,497
>>>>>>>>>
>>>>>>>>> *Map-Reduce Framework*
>>>>>>>>>
>>>>>>>>>     Combine output records     0
>>>>>>>>>
>>>>>>>>>     Map input records     115,815
>>>>>>>>>
>>>>>>>>>     Spilled Records     0
>>>>>>>>>
>>>>>>>>>     Map output bytes     0
>>>>>>>>>
>>>>>>>>>     Map input bytes     4,401,563
>>>>>>>>>
>>>>>>>>>     Combine input records     0
>>>>>>>>>
>>>>>>>>>     Map output records     0
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 04/29/2011 10:28 AM, Jake Mannix wrote:
>>>>>>>>>
>>>>>>>>>> On Fri, Apr 29, 2011 at 9:45 AM, Paul Mahon<pm...@decarta.com>
>>>>>>>>>>
>>>>>>>>>   wrote:
>>>>>>>>   I'm running the LanczosSolver (from SVN two days ago) it it's taking
>>>>>>>>>> (what
>>>>>>>>   I think) is a very long time to run. The matrix I'm using is very
>>>>>>>>>> sparse.
>>>>>>>>   The command I used was:
>>>>>>>>>>> bin/mahour svd --input ./base --output ./svd --numRows 12259359
>>>>>>>>>>>
>>>>>>>>>> --numCols
>>>>>>>>   2517335 --rank 10 --cleansvd true -Dmapred.child.java.opts=-Xmx2048m
>>>>>>>>>>> The first part of the program runs pretty quickly, and in less
>>>>>>>>>>> than
>>>>>>>>>>>
>>>>>>>>>> 10
>>>>>>>>   minutes it has found 10 eigenvectors. The second step (cleansvd) has
>>>>>>>>>> been
>>>>>>>>   running for 15 hours and is only 10% of the way through the map
>>>>>>>>>> tasks.
>>>>>>>> 10 minutes for 10 eigenvectors is totally reasonable, depending on
>>>>>>>>> how
>>>>>>>> sparse this matrix is.  cleansvd should take roughly the *same*
>>>>>>>>> amount of
>>>>>>>> time as the original solving, because it's basically just taking each
>>>>>>>>> of the
>>>>>>>> vectors and re-multiplying them by the original matrix to see how
>>>>>>>>> close to
>>>>>>>> actual eigenvectors they are.  Do you have any log output to share?
>>>>>>>>>   I can
>>>>>>>> try to reproduce this on a similarly sized matrix, but this defintely
>>>>>>>>> seems
>>>>>>>> wrong.
>>>>>>>>>>
>>>>>>>>>>   Is this performance expected? Is there something I can do to speed
>>>>>>>>>> it up?
>>>>>>>> No, and probably not.  Other than help me find the bug that's causing
>>>>>>>>> this.
>>>>>>>>   I've never seen that behavior before.
>>>>>>>>>>    -jake
>>>>>>>>>>
>>>>>>>>>>   --------------------------
>>>>>>> Ken Krugler
>>>>>>> +1 530-210-6378
>>>>>>> http://bixolabs.com
>>>>>>> e l a s t i c   w e b   m i n i n g
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>

Re: LanczosSolver Very Slow

Posted by Jake Mannix <ja...@gmail.com>.
Paul, I ran your data this morning (on my laptop, pseudo distributed mode)

INFO: Program took 1388093 ms

Yay!

  -jake

On Fri, Apr 29, 2011 at 10:24 PM, Jake Mannix <ja...@gmail.com> wrote:

> Great, let me know what you see.  I just checked in the last cleanup stuff.
>
>
> On Fri, Apr 29, 2011 at 9:55 PM, Paul Mahon <pm...@decarta.com> wrote:
>
>> That's for the quick work tracking this down. I'll give it another try and
>> see if it's faster now.
>>
>>
>> On 04/29/2011 09:46 PM, Jake Mannix wrote:
>>
>>> The commit I made a few minutes ago resolving MAHOUT-639 most likely has
>>> fixed this, but there are
>>> a few cleanup issues left coming shortly (instanceof DenseVector /
>>> instanceof
>>> SequentialAccessSparseVector, at places where isDense() should be used,
>>> in
>>> case of NamedVector
>>> showing up), which avoid too many extra vector copies.
>>>
>>> Paul, try trunk again?
>>>
>>>   -jake
>>>
>>> On Fri, Apr 29, 2011 at 2:32 PM, Jake Mannix<ja...@gmail.com>
>>>  wrote:
>>>
>>>  Ok, I think I've found it.  It was this issue, and another similar one.
>>>>  Going to verify in a bit, and patch
>>>> coming later this afternoon!
>>>>
>>>>   -jake
>>>>
>>>>
>>>> On Fri, Apr 29, 2011 at 1:42 PM, Jake Mannix<jake.mannix@gmail.com
>>>> >wrote:
>>>>
>>>>  I think this is actually a bug (or series of bugs) somewhere, not a
>>>>> configuration issue, as shown by the fact
>>>>> that LanczosSolver finishes just fine in 10 min.
>>>>>
>>>>> I'm looking into it.  I've already found one problem (TimesSquaredJob
>>>>> checks instanceOf DenseVector instead
>>>>> of calling isDense(), which gets the wrong answer for NamedVector
>>>>> instances), seeing if this is the slowdown
>>>>> cause now.
>>>>>
>>>>>   -jake
>>>>>
>>>>> On Fri, Apr 29, 2011 at 1:08 PM, Ken Krugler<
>>>>> kkrugler_lists@transpac.com
>>>>>
>>>>>> wrote:
>>>>>> Hi Paul,
>>>>>>
>>>>>> One random guess here - the total in-use memory for all tasks means
>>>>>> you've run out of physical memory, and are using swap.
>>>>>>
>>>>>> E.g. the use of JVM memory by hash maps is one way this can happen.
>>>>>>
>>>>>> Once that occurs, your performance will become effectively 0, as tasks
>>>>>> compete with each other for memory.
>>>>>>
>>>>>> To see if this is happening, run 'top' on a slave and see how much
>>>>>> swap
>>>>>> is being used.
>>>>>>
>>>>>> -- Ken
>>>>>>
>>>>>> On Apr 29, 2011, at 11:49am, Paul Mahon wrote:
>>>>>>
>>>>>>  I did dump the stack a few times. Here are a couple which came up
>>>>>>> most
>>>>>>>
>>>>>> often:
>>>>>>
>>>>>>> "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
>>>>>>>   java.lang.Thread.State: RUNNABLE
>>>>>>>    at
>>>>>>>
>>>>>>
>>>>>> org.apache.mahout.math.map.OpenIntDoubleHashMap.indexOfKey(OpenIntDoubleHashMap.java:285)
>>>>>>
>>>>>>>    at
>>>>>>>
>>>>>>
>>>>>> org.apache.mahout.math.map.OpenIntDoubleHashMap.get(OpenIntDoubleHashMap.java:202)
>>>>>>
>>>>>>>    at
>>>>>>>
>>>>>>
>>>>>> org.apache.mahout.math.RandomAccessSparseVector.getQuick(RandomAccessSparseVector.java:129)
>>>>>>
>>>>>>>    at
>>>>>>>
>>>>>> org.apache.mahout.math.AbstractVector.assign(AbstractVector.java:527)
>>>>>>
>>>>>>>    at
>>>>>>>
>>>>>>
>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:236)
>>>>>>
>>>>>>>    at
>>>>>>>
>>>>>>
>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
>>>>>>
>>>>>>>    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>>>>>>    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>>>>>>>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>>>>>>    at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>>>>
>>>>>>> "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
>>>>>>>   java.lang.Thread.State: RUNNABLE
>>>>>>>    at
>>>>>>>
>>>>>> org.apache.mahout.math.AbstractVector.assign(AbstractVector.java:526)
>>>>>>
>>>>>>>    at
>>>>>>>
>>>>>>
>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:236)
>>>>>>
>>>>>>>    at
>>>>>>>
>>>>>>
>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
>>>>>>
>>>>>>>    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>>>>>>    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>>>>>>>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>>>>>>    at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>>>>
>>>>>>> "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
>>>>>>>   java.lang.Thread.State: RUNNABLE
>>>>>>>    at
>>>>>>>
>>>>>>
>>>>>> org.apache.mahout.math.RandomAccessSparseVector.dot(RandomAccessSparseVector.java:192)
>>>>>>
>>>>>>>    at
>>>>>>>
>>>>>>
>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.scale(TimesSquaredJob.java:241)
>>>>>>
>>>>>>>    at
>>>>>>>
>>>>>>
>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:232)
>>>>>>
>>>>>>>    at
>>>>>>>
>>>>>>
>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
>>>>>>
>>>>>>>    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>>>>>>    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>>>>>>>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>>>>>>    at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 04/29/2011 10:42 AM, Paul Mahon wrote:
>>>>>>>
>>>>>>>> The task logs don't have anything useable:
>>>>>>>> *_syslog logs_*
>>>>>>>>
>>>>>>>> 2011-04-28 15:29:17,922 INFO
>>>>>>>> org.apache.hadoop.metrics.jvm.JvmMetrics:
>>>>>>>>
>>>>>>> Initializing JVM Metrics with processName=MAP, sessionId=
>>>>>>
>>>>>>> 2011-04-28 15:29:18,114 INFO org.apache.hadoop.mapred.MapTask:
>>>>>>>>
>>>>>>> numReduceTasks: 50
>>>>>>
>>>>>>> 2011-04-28 15:29:18,124 INFO org.apache.hadoop.mapred.MapTask:
>>>>>>>>
>>>>>>> io.sort.mb = 100
>>>>>>
>>>>>>> 2011-04-28 15:29:18,266 INFO org.apache.hadoop.mapred.MapTask: data
>>>>>>>>
>>>>>>> buffer = 79691776/99614720
>>>>>>
>>>>>>> 2011-04-28 15:29:18,266 INFO org.apache.hadoop.mapred.MapTask: record
>>>>>>>>
>>>>>>> buffer = 262144/327680
>>>>>>
>>>>>>> The driver log isn't very interesting either:
>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient: Job complete:
>>>>>>>>
>>>>>>> job_201104221106_0128
>>>>>>
>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient: Counters: 19
>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:   Job Counters
>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Launched reduce
>>>>>>>> tasks=50
>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Rack-local map tasks=8
>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Launched map tasks=10
>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Data-local map tasks=2
>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:   FileSystemCounters
>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:
>>>>>>>> FILE_BYTES_READ=201387210
>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:
>>>>>>>> HDFS_BYTES_READ=855491954
>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:
>>>>>>>>
>>>>>>> FILE_BYTES_WRITTEN=402789200
>>>>>>
>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:
>>>>>>>>
>>>>>>> HDFS_BYTES_WRITTEN=20143593
>>>>>>
>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:   Map-Reduce Framework
>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce input groups=1
>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Combine output
>>>>>>>> records=10
>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map input
>>>>>>>>
>>>>>>> records=12259359
>>>>>>
>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce shuffle
>>>>>>>>
>>>>>>> bytes=181251087
>>>>>>
>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce output records=1
>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Spilled Records=20
>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map output
>>>>>>>>
>>>>>>> bytes=201386850
>>>>>>
>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map input
>>>>>>>> bytes=654091926
>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Combine input
>>>>>>>> records=10
>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map output records=10
>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce input records=10
>>>>>>>> 11/04/28 14:26:45 INFO lanczos.LanczosSolver: 9 passes through the
>>>>>>>>
>>>>>>> corpus so far...
>>>>>>
>>>>>>> 11/04/28 14:26:53 INFO lanczos.LanczosSolver: Lanczos iteration
>>>>>>>>
>>>>>>> complete - now to diagonalize the tri-diagonal auxiliary matrix.
>>>>>>
>>>>>>> 11/04/28 14:26:58 INFO lanczos.LanczosSolver: Eigenvector 0 found
>>>>>>>> with
>>>>>>>>
>>>>>>> eigenvalue 0.0
>>>>>>
>>>>>>> 11/04/28 14:27:02 INFO lanczos.LanczosSolver: Eigenvector 1 found
>>>>>>>> with
>>>>>>>>
>>>>>>> eigenvalue 2.7694996270467187
>>>>>>
>>>>>>> 11/04/28 14:27:06 INFO lanczos.LanczosSolver: Eigenvector 2 found
>>>>>>>> with
>>>>>>>>
>>>>>>> eigenvalue 124.51350651732557
>>>>>>
>>>>>>> 11/04/28 14:27:10 INFO lanczos.LanczosSolver: Eigenvector 3 found
>>>>>>>> with
>>>>>>>>
>>>>>>> eigenvalue 254.94694717051294
>>>>>>
>>>>>>> 11/04/28 14:27:15 INFO lanczos.LanczosSolver: Eigenvector 4 found
>>>>>>>> with
>>>>>>>>
>>>>>>> eigenvalue 387.54002942185565
>>>>>>
>>>>>>> 11/04/28 14:27:19 INFO lanczos.LanczosSolver: Eigenvector 5 found
>>>>>>>> with
>>>>>>>>
>>>>>>> eigenvalue 501.70725582316834
>>>>>>
>>>>>>> 11/04/28 14:27:23 INFO lanczos.LanczosSolver: Eigenvector 6 found
>>>>>>>> with
>>>>>>>>
>>>>>>> eigenvalue 614.8114660750927
>>>>>>
>>>>>>> 11/04/28 14:27:28 INFO lanczos.LanczosSolver: Eigenvector 7 found
>>>>>>>> with
>>>>>>>>
>>>>>>> eigenvalue 700.1694351007611
>>>>>>
>>>>>>> 11/04/28 14:27:32 INFO lanczos.LanczosSolver: Eigenvector 8 found
>>>>>>>> with
>>>>>>>>
>>>>>>> eigenvalue 775.2520850269659
>>>>>>
>>>>>>> 11/04/28 14:27:36 INFO lanczos.LanczosSolver: Eigenvector 9 found
>>>>>>>> with
>>>>>>>>
>>>>>>> eigenvalue 39441.64442962535
>>>>>>
>>>>>>> 11/04/28 14:27:36 INFO lanczos.LanczosSolver: LanczosSolver finished.
>>>>>>>> 11/04/28 14:27:36 INFO decomposer.DistributedLanczosSolver:
>>>>>>>> Persisting
>>>>>>>>
>>>>>>> 10 eigenVectors and eigenValues to: ./svd/rawEigenvectors
>>>>>>
>>>>>>> 11/04/28 14:29:11 INFO mapred.FileInputFormat: Total input paths to
>>>>>>>>
>>>>>>> process : 1
>>>>>>
>>>>>>> 11/04/28 14:29:12 INFO mapred.JobClient: Running job:
>>>>>>>>
>>>>>>> job_201104221106_0129
>>>>>>
>>>>>>> 11/04/28 14:29:13 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>>>> 11/04/28 16:08:23 INFO mapred.JobClient:  map 1% reduce 0%
>>>>>>>> ...
>>>>>>>> 11/04/29 05:32:24 INFO mapred.JobClient:  map 9% reduce 0%
>>>>>>>> 11/04/29 07:18:38 INFO mapred.JobClient:  map 10% reduce 0%
>>>>>>>>
>>>>>>>>
>>>>>>>> The counters don't mean much to me either. Here's one set from an
>>>>>>>>
>>>>>>> incomplete mapper:
>>>>>>
>>>>>>> *FileSystemCounters*
>>>>>>>>
>>>>>>>>    HDFS_BYTES_READ     24,540,497
>>>>>>>>
>>>>>>>> *Map-Reduce Framework*
>>>>>>>>
>>>>>>>>    Combine output records     0
>>>>>>>>
>>>>>>>>    Map input records     115,815
>>>>>>>>
>>>>>>>>    Spilled Records     0
>>>>>>>>
>>>>>>>>    Map output bytes     0
>>>>>>>>
>>>>>>>>    Map input bytes     4,401,563
>>>>>>>>
>>>>>>>>    Combine input records     0
>>>>>>>>
>>>>>>>>    Map output records     0
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 04/29/2011 10:28 AM, Jake Mannix wrote:
>>>>>>>>
>>>>>>>>> On Fri, Apr 29, 2011 at 9:45 AM, Paul Mahon<pm...@decarta.com>
>>>>>>>>>
>>>>>>>>  wrote:
>>>>>>
>>>>>>>  I'm running the LanczosSolver (from SVN two days ago) it it's taking
>>>>>>>>>>
>>>>>>>>> (what
>>>>>>
>>>>>>>  I think) is a very long time to run. The matrix I'm using is very
>>>>>>>>>>
>>>>>>>>> sparse.
>>>>>>
>>>>>>>  The command I used was:
>>>>>>>>>>
>>>>>>>>>> bin/mahour svd --input ./base --output ./svd --numRows 12259359
>>>>>>>>>>
>>>>>>>>> --numCols
>>>>>>
>>>>>>>  2517335 --rank 10 --cleansvd true -Dmapred.child.java.opts=-Xmx2048m
>>>>>>>>>>
>>>>>>>>>> The first part of the program runs pretty quickly, and in less
>>>>>>>>>> than
>>>>>>>>>>
>>>>>>>>> 10
>>>>>>
>>>>>>>  minutes it has found 10 eigenvectors. The second step (cleansvd) has
>>>>>>>>>>
>>>>>>>>> been
>>>>>>
>>>>>>>  running for 15 hours and is only 10% of the way through the map
>>>>>>>>>>
>>>>>>>>> tasks.
>>>>>>
>>>>>>> 10 minutes for 10 eigenvectors is totally reasonable, depending on
>>>>>>>>>
>>>>>>>> how
>>>>>>
>>>>>>> sparse this matrix is.  cleansvd should take roughly the *same*
>>>>>>>>>
>>>>>>>> amount of
>>>>>>
>>>>>>> time as the original solving, because it's basically just taking each
>>>>>>>>>
>>>>>>>> of the
>>>>>>
>>>>>>> vectors and re-multiplying them by the original matrix to see how
>>>>>>>>>
>>>>>>>> close to
>>>>>>
>>>>>>> actual eigenvectors they are.  Do you have any log output to share?
>>>>>>>>>
>>>>>>>>  I can
>>>>>>
>>>>>>> try to reproduce this on a similarly sized matrix, but this defintely
>>>>>>>>>
>>>>>>>> seems
>>>>>>
>>>>>>> wrong.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  Is this performance expected? Is there something I can do to speed
>>>>>>>>>>
>>>>>>>>> it up?
>>>>>>
>>>>>>> No, and probably not.  Other than help me find the bug that's causing
>>>>>>>>>
>>>>>>>> this.
>>>>>>
>>>>>>>  I've never seen that behavior before.
>>>>>>>>>
>>>>>>>>>   -jake
>>>>>>>>>
>>>>>>>>>  --------------------------
>>>>>> Ken Krugler
>>>>>> +1 530-210-6378
>>>>>> http://bixolabs.com
>>>>>> e l a s t i c   w e b   m i n i n g
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>

Re: LanczosSolver Very Slow

Posted by Jake Mannix <ja...@gmail.com>.
Great, let me know what you see.  I just checked in the last cleanup stuff.

On Fri, Apr 29, 2011 at 9:55 PM, Paul Mahon <pm...@decarta.com> wrote:

> That's for the quick work tracking this down. I'll give it another try and
> see if it's faster now.
>
>
> On 04/29/2011 09:46 PM, Jake Mannix wrote:
>
>> The commit I made a few minutes ago resolving MAHOUT-639 most likely has
>> fixed this, but there are
>> a few cleanup issues left coming shortly (instanceof DenseVector /
>> instanceof
>> SequentialAccessSparseVector, at places where isDense() should be used, in
>> case of NamedVector
>> showing up), which avoid too many extra vector copies.
>>
>> Paul, try trunk again?
>>
>>   -jake
>>
>> On Fri, Apr 29, 2011 at 2:32 PM, Jake Mannix<ja...@gmail.com>
>>  wrote:
>>
>>  Ok, I think I've found it.  It was this issue, and another similar one.
>>>  Going to verify in a bit, and patch
>>> coming later this afternoon!
>>>
>>>   -jake
>>>
>>>
>>> On Fri, Apr 29, 2011 at 1:42 PM, Jake Mannix<jake.mannix@gmail.com
>>> >wrote:
>>>
>>>  I think this is actually a bug (or series of bugs) somewhere, not a
>>>> configuration issue, as shown by the fact
>>>> that LanczosSolver finishes just fine in 10 min.
>>>>
>>>> I'm looking into it.  I've already found one problem (TimesSquaredJob
>>>> checks instanceOf DenseVector instead
>>>> of calling isDense(), which gets the wrong answer for NamedVector
>>>> instances), seeing if this is the slowdown
>>>> cause now.
>>>>
>>>>   -jake
>>>>
>>>> On Fri, Apr 29, 2011 at 1:08 PM, Ken Krugler<
>>>> kkrugler_lists@transpac.com
>>>>
>>>>> wrote:
>>>>> Hi Paul,
>>>>>
>>>>> One random guess here - the total in-use memory for all tasks means
>>>>> you've run out of physical memory, and are using swap.
>>>>>
>>>>> E.g. the use of JVM memory by hash maps is one way this can happen.
>>>>>
>>>>> Once that occurs, your performance will become effectively 0, as tasks
>>>>> compete with each other for memory.
>>>>>
>>>>> To see if this is happening, run 'top' on a slave and see how much swap
>>>>> is being used.
>>>>>
>>>>> -- Ken
>>>>>
>>>>> On Apr 29, 2011, at 11:49am, Paul Mahon wrote:
>>>>>
>>>>>  I did dump the stack a few times. Here are a couple which came up most
>>>>>>
>>>>> often:
>>>>>
>>>>>> "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
>>>>>>   java.lang.Thread.State: RUNNABLE
>>>>>>    at
>>>>>>
>>>>>
>>>>> org.apache.mahout.math.map.OpenIntDoubleHashMap.indexOfKey(OpenIntDoubleHashMap.java:285)
>>>>>
>>>>>>    at
>>>>>>
>>>>>
>>>>> org.apache.mahout.math.map.OpenIntDoubleHashMap.get(OpenIntDoubleHashMap.java:202)
>>>>>
>>>>>>    at
>>>>>>
>>>>>
>>>>> org.apache.mahout.math.RandomAccessSparseVector.getQuick(RandomAccessSparseVector.java:129)
>>>>>
>>>>>>    at
>>>>>>
>>>>> org.apache.mahout.math.AbstractVector.assign(AbstractVector.java:527)
>>>>>
>>>>>>    at
>>>>>>
>>>>>
>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:236)
>>>>>
>>>>>>    at
>>>>>>
>>>>>
>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
>>>>>
>>>>>>    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>>>>>    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>>>>>>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>>>>>    at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>>>
>>>>>> "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
>>>>>>   java.lang.Thread.State: RUNNABLE
>>>>>>    at
>>>>>>
>>>>> org.apache.mahout.math.AbstractVector.assign(AbstractVector.java:526)
>>>>>
>>>>>>    at
>>>>>>
>>>>>
>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:236)
>>>>>
>>>>>>    at
>>>>>>
>>>>>
>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
>>>>>
>>>>>>    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>>>>>    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>>>>>>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>>>>>    at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>>>
>>>>>> "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
>>>>>>   java.lang.Thread.State: RUNNABLE
>>>>>>    at
>>>>>>
>>>>>
>>>>> org.apache.mahout.math.RandomAccessSparseVector.dot(RandomAccessSparseVector.java:192)
>>>>>
>>>>>>    at
>>>>>>
>>>>>
>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.scale(TimesSquaredJob.java:241)
>>>>>
>>>>>>    at
>>>>>>
>>>>>
>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:232)
>>>>>
>>>>>>    at
>>>>>>
>>>>>
>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
>>>>>
>>>>>>    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>>>>>    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>>>>>>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>>>>>    at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 04/29/2011 10:42 AM, Paul Mahon wrote:
>>>>>>
>>>>>>> The task logs don't have anything useable:
>>>>>>> *_syslog logs_*
>>>>>>>
>>>>>>> 2011-04-28 15:29:17,922 INFO
>>>>>>> org.apache.hadoop.metrics.jvm.JvmMetrics:
>>>>>>>
>>>>>> Initializing JVM Metrics with processName=MAP, sessionId=
>>>>>
>>>>>> 2011-04-28 15:29:18,114 INFO org.apache.hadoop.mapred.MapTask:
>>>>>>>
>>>>>> numReduceTasks: 50
>>>>>
>>>>>> 2011-04-28 15:29:18,124 INFO org.apache.hadoop.mapred.MapTask:
>>>>>>>
>>>>>> io.sort.mb = 100
>>>>>
>>>>>> 2011-04-28 15:29:18,266 INFO org.apache.hadoop.mapred.MapTask: data
>>>>>>>
>>>>>> buffer = 79691776/99614720
>>>>>
>>>>>> 2011-04-28 15:29:18,266 INFO org.apache.hadoop.mapred.MapTask: record
>>>>>>>
>>>>>> buffer = 262144/327680
>>>>>
>>>>>> The driver log isn't very interesting either:
>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient: Job complete:
>>>>>>>
>>>>>> job_201104221106_0128
>>>>>
>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient: Counters: 19
>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:   Job Counters
>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Launched reduce tasks=50
>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Rack-local map tasks=8
>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Launched map tasks=10
>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Data-local map tasks=2
>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:   FileSystemCounters
>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:
>>>>>>> FILE_BYTES_READ=201387210
>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:
>>>>>>> HDFS_BYTES_READ=855491954
>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:
>>>>>>>
>>>>>> FILE_BYTES_WRITTEN=402789200
>>>>>
>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:
>>>>>>>
>>>>>> HDFS_BYTES_WRITTEN=20143593
>>>>>
>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:   Map-Reduce Framework
>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce input groups=1
>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Combine output
>>>>>>> records=10
>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map input
>>>>>>>
>>>>>> records=12259359
>>>>>
>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce shuffle
>>>>>>>
>>>>>> bytes=181251087
>>>>>
>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce output records=1
>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Spilled Records=20
>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map output
>>>>>>>
>>>>>> bytes=201386850
>>>>>
>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map input bytes=654091926
>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Combine input records=10
>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map output records=10
>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce input records=10
>>>>>>> 11/04/28 14:26:45 INFO lanczos.LanczosSolver: 9 passes through the
>>>>>>>
>>>>>> corpus so far...
>>>>>
>>>>>> 11/04/28 14:26:53 INFO lanczos.LanczosSolver: Lanczos iteration
>>>>>>>
>>>>>> complete - now to diagonalize the tri-diagonal auxiliary matrix.
>>>>>
>>>>>> 11/04/28 14:26:58 INFO lanczos.LanczosSolver: Eigenvector 0 found with
>>>>>>>
>>>>>> eigenvalue 0.0
>>>>>
>>>>>> 11/04/28 14:27:02 INFO lanczos.LanczosSolver: Eigenvector 1 found with
>>>>>>>
>>>>>> eigenvalue 2.7694996270467187
>>>>>
>>>>>> 11/04/28 14:27:06 INFO lanczos.LanczosSolver: Eigenvector 2 found with
>>>>>>>
>>>>>> eigenvalue 124.51350651732557
>>>>>
>>>>>> 11/04/28 14:27:10 INFO lanczos.LanczosSolver: Eigenvector 3 found with
>>>>>>>
>>>>>> eigenvalue 254.94694717051294
>>>>>
>>>>>> 11/04/28 14:27:15 INFO lanczos.LanczosSolver: Eigenvector 4 found with
>>>>>>>
>>>>>> eigenvalue 387.54002942185565
>>>>>
>>>>>> 11/04/28 14:27:19 INFO lanczos.LanczosSolver: Eigenvector 5 found with
>>>>>>>
>>>>>> eigenvalue 501.70725582316834
>>>>>
>>>>>> 11/04/28 14:27:23 INFO lanczos.LanczosSolver: Eigenvector 6 found with
>>>>>>>
>>>>>> eigenvalue 614.8114660750927
>>>>>
>>>>>> 11/04/28 14:27:28 INFO lanczos.LanczosSolver: Eigenvector 7 found with
>>>>>>>
>>>>>> eigenvalue 700.1694351007611
>>>>>
>>>>>> 11/04/28 14:27:32 INFO lanczos.LanczosSolver: Eigenvector 8 found with
>>>>>>>
>>>>>> eigenvalue 775.2520850269659
>>>>>
>>>>>> 11/04/28 14:27:36 INFO lanczos.LanczosSolver: Eigenvector 9 found with
>>>>>>>
>>>>>> eigenvalue 39441.64442962535
>>>>>
>>>>>> 11/04/28 14:27:36 INFO lanczos.LanczosSolver: LanczosSolver finished.
>>>>>>> 11/04/28 14:27:36 INFO decomposer.DistributedLanczosSolver:
>>>>>>> Persisting
>>>>>>>
>>>>>> 10 eigenVectors and eigenValues to: ./svd/rawEigenvectors
>>>>>
>>>>>> 11/04/28 14:29:11 INFO mapred.FileInputFormat: Total input paths to
>>>>>>>
>>>>>> process : 1
>>>>>
>>>>>> 11/04/28 14:29:12 INFO mapred.JobClient: Running job:
>>>>>>>
>>>>>> job_201104221106_0129
>>>>>
>>>>>> 11/04/28 14:29:13 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>>> 11/04/28 16:08:23 INFO mapred.JobClient:  map 1% reduce 0%
>>>>>>> ...
>>>>>>> 11/04/29 05:32:24 INFO mapred.JobClient:  map 9% reduce 0%
>>>>>>> 11/04/29 07:18:38 INFO mapred.JobClient:  map 10% reduce 0%
>>>>>>>
>>>>>>>
>>>>>>> The counters don't mean much to me either. Here's one set from an
>>>>>>>
>>>>>> incomplete mapper:
>>>>>
>>>>>> *FileSystemCounters*
>>>>>>>
>>>>>>>    HDFS_BYTES_READ     24,540,497
>>>>>>>
>>>>>>> *Map-Reduce Framework*
>>>>>>>
>>>>>>>    Combine output records     0
>>>>>>>
>>>>>>>    Map input records     115,815
>>>>>>>
>>>>>>>    Spilled Records     0
>>>>>>>
>>>>>>>    Map output bytes     0
>>>>>>>
>>>>>>>    Map input bytes     4,401,563
>>>>>>>
>>>>>>>    Combine input records     0
>>>>>>>
>>>>>>>    Map output records     0
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 04/29/2011 10:28 AM, Jake Mannix wrote:
>>>>>>>
>>>>>>>> On Fri, Apr 29, 2011 at 9:45 AM, Paul Mahon<pm...@decarta.com>
>>>>>>>>
>>>>>>>  wrote:
>>>>>
>>>>>>  I'm running the LanczosSolver (from SVN two days ago) it it's taking
>>>>>>>>>
>>>>>>>> (what
>>>>>
>>>>>>  I think) is a very long time to run. The matrix I'm using is very
>>>>>>>>>
>>>>>>>> sparse.
>>>>>
>>>>>>  The command I used was:
>>>>>>>>>
>>>>>>>>> bin/mahour svd --input ./base --output ./svd --numRows 12259359
>>>>>>>>>
>>>>>>>> --numCols
>>>>>
>>>>>>  2517335 --rank 10 --cleansvd true -Dmapred.child.java.opts=-Xmx2048m
>>>>>>>>>
>>>>>>>>> The first part of the program runs pretty quickly, and in less than
>>>>>>>>>
>>>>>>>> 10
>>>>>
>>>>>>  minutes it has found 10 eigenvectors. The second step (cleansvd) has
>>>>>>>>>
>>>>>>>> been
>>>>>
>>>>>>  running for 15 hours and is only 10% of the way through the map
>>>>>>>>>
>>>>>>>> tasks.
>>>>>
>>>>>> 10 minutes for 10 eigenvectors is totally reasonable, depending on
>>>>>>>>
>>>>>>> how
>>>>>
>>>>>> sparse this matrix is.  cleansvd should take roughly the *same*
>>>>>>>>
>>>>>>> amount of
>>>>>
>>>>>> time as the original solving, because it's basically just taking each
>>>>>>>>
>>>>>>> of the
>>>>>
>>>>>> vectors and re-multiplying them by the original matrix to see how
>>>>>>>>
>>>>>>> close to
>>>>>
>>>>>> actual eigenvectors they are.  Do you have any log output to share?
>>>>>>>>
>>>>>>>  I can
>>>>>
>>>>>> try to reproduce this on a similarly sized matrix, but this defintely
>>>>>>>>
>>>>>>> seems
>>>>>
>>>>>> wrong.
>>>>>>>>
>>>>>>>>
>>>>>>>>  Is this performance expected? Is there something I can do to speed
>>>>>>>>>
>>>>>>>> it up?
>>>>>
>>>>>> No, and probably not.  Other than help me find the bug that's causing
>>>>>>>>
>>>>>>> this.
>>>>>
>>>>>>  I've never seen that behavior before.
>>>>>>>>
>>>>>>>>   -jake
>>>>>>>>
>>>>>>>>  --------------------------
>>>>> Ken Krugler
>>>>> +1 530-210-6378
>>>>> http://bixolabs.com
>>>>> e l a s t i c   w e b   m i n i n g
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>

Re: LanczosSolver Very Slow

Posted by Paul Mahon <pm...@decarta.com>.
That's for the quick work tracking this down. I'll give it another try 
and see if it's faster now.

On 04/29/2011 09:46 PM, Jake Mannix wrote:
> The commit I made a few minutes ago resolving MAHOUT-639 most likely has
> fixed this, but there are
> a few cleanup issues left coming shortly (instanceof DenseVector /
> instanceof
> SequentialAccessSparseVector, at places where isDense() should be used, in
> case of NamedVector
> showing up), which avoid too many extra vector copies.
>
> Paul, try trunk again?
>
>    -jake
>
> On Fri, Apr 29, 2011 at 2:32 PM, Jake Mannix<ja...@gmail.com>  wrote:
>
>> Ok, I think I've found it.  It was this issue, and another similar one.
>>   Going to verify in a bit, and patch
>> coming later this afternoon!
>>
>>    -jake
>>
>>
>> On Fri, Apr 29, 2011 at 1:42 PM, Jake Mannix<ja...@gmail.com>wrote:
>>
>>> I think this is actually a bug (or series of bugs) somewhere, not a
>>> configuration issue, as shown by the fact
>>> that LanczosSolver finishes just fine in 10 min.
>>>
>>> I'm looking into it.  I've already found one problem (TimesSquaredJob
>>> checks instanceOf DenseVector instead
>>> of calling isDense(), which gets the wrong answer for NamedVector
>>> instances), seeing if this is the slowdown
>>> cause now.
>>>
>>>    -jake
>>>
>>> On Fri, Apr 29, 2011 at 1:08 PM, Ken Krugler<kkrugler_lists@transpac.com
>>>> wrote:
>>>> Hi Paul,
>>>>
>>>> One random guess here - the total in-use memory for all tasks means
>>>> you've run out of physical memory, and are using swap.
>>>>
>>>> E.g. the use of JVM memory by hash maps is one way this can happen.
>>>>
>>>> Once that occurs, your performance will become effectively 0, as tasks
>>>> compete with each other for memory.
>>>>
>>>> To see if this is happening, run 'top' on a slave and see how much swap
>>>> is being used.
>>>>
>>>> -- Ken
>>>>
>>>> On Apr 29, 2011, at 11:49am, Paul Mahon wrote:
>>>>
>>>>> I did dump the stack a few times. Here are a couple which came up most
>>>> often:
>>>>> "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
>>>>>    java.lang.Thread.State: RUNNABLE
>>>>>     at
>>>> org.apache.mahout.math.map.OpenIntDoubleHashMap.indexOfKey(OpenIntDoubleHashMap.java:285)
>>>>>     at
>>>> org.apache.mahout.math.map.OpenIntDoubleHashMap.get(OpenIntDoubleHashMap.java:202)
>>>>>     at
>>>> org.apache.mahout.math.RandomAccessSparseVector.getQuick(RandomAccessSparseVector.java:129)
>>>>>     at
>>>> org.apache.mahout.math.AbstractVector.assign(AbstractVector.java:527)
>>>>>     at
>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:236)
>>>>>     at
>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
>>>>>     at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>>>>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>>
>>>>> "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
>>>>>    java.lang.Thread.State: RUNNABLE
>>>>>     at
>>>> org.apache.mahout.math.AbstractVector.assign(AbstractVector.java:526)
>>>>>     at
>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:236)
>>>>>     at
>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
>>>>>     at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>>>>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>>
>>>>> "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
>>>>>    java.lang.Thread.State: RUNNABLE
>>>>>     at
>>>> org.apache.mahout.math.RandomAccessSparseVector.dot(RandomAccessSparseVector.java:192)
>>>>>     at
>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.scale(TimesSquaredJob.java:241)
>>>>>     at
>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:232)
>>>>>     at
>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
>>>>>     at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>>>>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>>
>>>>>
>>>>>
>>>>> On 04/29/2011 10:42 AM, Paul Mahon wrote:
>>>>>> The task logs don't have anything useable:
>>>>>> *_syslog logs_*
>>>>>>
>>>>>> 2011-04-28 15:29:17,922 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
>>>> Initializing JVM Metrics with processName=MAP, sessionId=
>>>>>> 2011-04-28 15:29:18,114 INFO org.apache.hadoop.mapred.MapTask:
>>>> numReduceTasks: 50
>>>>>> 2011-04-28 15:29:18,124 INFO org.apache.hadoop.mapred.MapTask:
>>>> io.sort.mb = 100
>>>>>> 2011-04-28 15:29:18,266 INFO org.apache.hadoop.mapred.MapTask: data
>>>> buffer = 79691776/99614720
>>>>>> 2011-04-28 15:29:18,266 INFO org.apache.hadoop.mapred.MapTask: record
>>>> buffer = 262144/327680
>>>>>> The driver log isn't very interesting either:
>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient: Job complete:
>>>> job_201104221106_0128
>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient: Counters: 19
>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:   Job Counters
>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Launched reduce tasks=50
>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Rack-local map tasks=8
>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Launched map tasks=10
>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Data-local map tasks=2
>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:   FileSystemCounters
>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     FILE_BYTES_READ=201387210
>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     HDFS_BYTES_READ=855491954
>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:
>>>> FILE_BYTES_WRITTEN=402789200
>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:
>>>> HDFS_BYTES_WRITTEN=20143593
>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:   Map-Reduce Framework
>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce input groups=1
>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Combine output records=10
>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map input
>>>> records=12259359
>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce shuffle
>>>> bytes=181251087
>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce output records=1
>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Spilled Records=20
>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map output
>>>> bytes=201386850
>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map input bytes=654091926
>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Combine input records=10
>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map output records=10
>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce input records=10
>>>>>> 11/04/28 14:26:45 INFO lanczos.LanczosSolver: 9 passes through the
>>>> corpus so far...
>>>>>> 11/04/28 14:26:53 INFO lanczos.LanczosSolver: Lanczos iteration
>>>> complete - now to diagonalize the tri-diagonal auxiliary matrix.
>>>>>> 11/04/28 14:26:58 INFO lanczos.LanczosSolver: Eigenvector 0 found with
>>>> eigenvalue 0.0
>>>>>> 11/04/28 14:27:02 INFO lanczos.LanczosSolver: Eigenvector 1 found with
>>>> eigenvalue 2.7694996270467187
>>>>>> 11/04/28 14:27:06 INFO lanczos.LanczosSolver: Eigenvector 2 found with
>>>> eigenvalue 124.51350651732557
>>>>>> 11/04/28 14:27:10 INFO lanczos.LanczosSolver: Eigenvector 3 found with
>>>> eigenvalue 254.94694717051294
>>>>>> 11/04/28 14:27:15 INFO lanczos.LanczosSolver: Eigenvector 4 found with
>>>> eigenvalue 387.54002942185565
>>>>>> 11/04/28 14:27:19 INFO lanczos.LanczosSolver: Eigenvector 5 found with
>>>> eigenvalue 501.70725582316834
>>>>>> 11/04/28 14:27:23 INFO lanczos.LanczosSolver: Eigenvector 6 found with
>>>> eigenvalue 614.8114660750927
>>>>>> 11/04/28 14:27:28 INFO lanczos.LanczosSolver: Eigenvector 7 found with
>>>> eigenvalue 700.1694351007611
>>>>>> 11/04/28 14:27:32 INFO lanczos.LanczosSolver: Eigenvector 8 found with
>>>> eigenvalue 775.2520850269659
>>>>>> 11/04/28 14:27:36 INFO lanczos.LanczosSolver: Eigenvector 9 found with
>>>> eigenvalue 39441.64442962535
>>>>>> 11/04/28 14:27:36 INFO lanczos.LanczosSolver: LanczosSolver finished.
>>>>>> 11/04/28 14:27:36 INFO decomposer.DistributedLanczosSolver: Persisting
>>>> 10 eigenVectors and eigenValues to: ./svd/rawEigenvectors
>>>>>> 11/04/28 14:29:11 INFO mapred.FileInputFormat: Total input paths to
>>>> process : 1
>>>>>> 11/04/28 14:29:12 INFO mapred.JobClient: Running job:
>>>> job_201104221106_0129
>>>>>> 11/04/28 14:29:13 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>> 11/04/28 16:08:23 INFO mapred.JobClient:  map 1% reduce 0%
>>>>>> ...
>>>>>> 11/04/29 05:32:24 INFO mapred.JobClient:  map 9% reduce 0%
>>>>>> 11/04/29 07:18:38 INFO mapred.JobClient:  map 10% reduce 0%
>>>>>>
>>>>>>
>>>>>> The counters don't mean much to me either. Here's one set from an
>>>> incomplete mapper:
>>>>>> *FileSystemCounters*
>>>>>>
>>>>>>     HDFS_BYTES_READ     24,540,497
>>>>>>
>>>>>> *Map-Reduce Framework*
>>>>>>
>>>>>>     Combine output records     0
>>>>>>
>>>>>>     Map input records     115,815
>>>>>>
>>>>>>     Spilled Records     0
>>>>>>
>>>>>>     Map output bytes     0
>>>>>>
>>>>>>     Map input bytes     4,401,563
>>>>>>
>>>>>>     Combine input records     0
>>>>>>
>>>>>>     Map output records     0
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 04/29/2011 10:28 AM, Jake Mannix wrote:
>>>>>>> On Fri, Apr 29, 2011 at 9:45 AM, Paul Mahon<pm...@decarta.com>
>>>>   wrote:
>>>>>>>> I'm running the LanczosSolver (from SVN two days ago) it it's taking
>>>> (what
>>>>>>>> I think) is a very long time to run. The matrix I'm using is very
>>>> sparse.
>>>>>>>> The command I used was:
>>>>>>>>
>>>>>>>> bin/mahour svd --input ./base --output ./svd --numRows 12259359
>>>> --numCols
>>>>>>>> 2517335 --rank 10 --cleansvd true -Dmapred.child.java.opts=-Xmx2048m
>>>>>>>>
>>>>>>>> The first part of the program runs pretty quickly, and in less than
>>>> 10
>>>>>>>> minutes it has found 10 eigenvectors. The second step (cleansvd) has
>>>> been
>>>>>>>> running for 15 hours and is only 10% of the way through the map
>>>> tasks.
>>>>>>> 10 minutes for 10 eigenvectors is totally reasonable, depending on
>>>> how
>>>>>>> sparse this matrix is.  cleansvd should take roughly the *same*
>>>> amount of
>>>>>>> time as the original solving, because it's basically just taking each
>>>> of the
>>>>>>> vectors and re-multiplying them by the original matrix to see how
>>>> close to
>>>>>>> actual eigenvectors they are.  Do you have any log output to share?
>>>>   I can
>>>>>>> try to reproduce this on a similarly sized matrix, but this defintely
>>>> seems
>>>>>>> wrong.
>>>>>>>
>>>>>>>
>>>>>>>> Is this performance expected? Is there something I can do to speed
>>>> it up?
>>>>>>> No, and probably not.  Other than help me find the bug that's causing
>>>> this.
>>>>>>>   I've never seen that behavior before.
>>>>>>>
>>>>>>>    -jake
>>>>>>>
>>>> --------------------------
>>>> Ken Krugler
>>>> +1 530-210-6378
>>>> http://bixolabs.com
>>>> e l a s t i c   w e b   m i n i n g
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>

Re: LanczosSolver Very Slow

Posted by Jake Mannix <ja...@gmail.com>.
The commit I made a few minutes ago resolving MAHOUT-639 most likely has
fixed this, but there are
a few cleanup issues left coming shortly (instanceof DenseVector /
instanceof
SequentialAccessSparseVector, at places where isDense() should be used, in
case of NamedVector
showing up), which avoid too many extra vector copies.

Paul, try trunk again?

  -jake

On Fri, Apr 29, 2011 at 2:32 PM, Jake Mannix <ja...@gmail.com> wrote:

> Ok, I think I've found it.  It was this issue, and another similar one.
>  Going to verify in a bit, and patch
> coming later this afternoon!
>
>   -jake
>
>
> On Fri, Apr 29, 2011 at 1:42 PM, Jake Mannix <ja...@gmail.com>wrote:
>
>> I think this is actually a bug (or series of bugs) somewhere, not a
>> configuration issue, as shown by the fact
>> that LanczosSolver finishes just fine in 10 min.
>>
>> I'm looking into it.  I've already found one problem (TimesSquaredJob
>> checks instanceOf DenseVector instead
>> of calling isDense(), which gets the wrong answer for NamedVector
>> instances), seeing if this is the slowdown
>> cause now.
>>
>>   -jake
>>
>> On Fri, Apr 29, 2011 at 1:08 PM, Ken Krugler <kkrugler_lists@transpac.com
>> > wrote:
>>
>>> Hi Paul,
>>>
>>> One random guess here - the total in-use memory for all tasks means
>>> you've run out of physical memory, and are using swap.
>>>
>>> E.g. the use of JVM memory by hash maps is one way this can happen.
>>>
>>> Once that occurs, your performance will become effectively 0, as tasks
>>> compete with each other for memory.
>>>
>>> To see if this is happening, run 'top' on a slave and see how much swap
>>> is being used.
>>>
>>> -- Ken
>>>
>>> On Apr 29, 2011, at 11:49am, Paul Mahon wrote:
>>>
>>> > I did dump the stack a few times. Here are a couple which came up most
>>> often:
>>> > "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
>>> >   java.lang.Thread.State: RUNNABLE
>>> >    at
>>> org.apache.mahout.math.map.OpenIntDoubleHashMap.indexOfKey(OpenIntDoubleHashMap.java:285)
>>> >    at
>>> org.apache.mahout.math.map.OpenIntDoubleHashMap.get(OpenIntDoubleHashMap.java:202)
>>> >    at
>>> org.apache.mahout.math.RandomAccessSparseVector.getQuick(RandomAccessSparseVector.java:129)
>>> >    at
>>> org.apache.mahout.math.AbstractVector.assign(AbstractVector.java:527)
>>> >    at
>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:236)
>>> >    at
>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
>>> >    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>> >    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>>> >    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>> >    at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>> >
>>> > "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
>>> >   java.lang.Thread.State: RUNNABLE
>>> >    at
>>> org.apache.mahout.math.AbstractVector.assign(AbstractVector.java:526)
>>> >    at
>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:236)
>>> >    at
>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
>>> >    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>> >    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>>> >    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>> >    at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>> >
>>> > "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
>>> >   java.lang.Thread.State: RUNNABLE
>>> >    at
>>> org.apache.mahout.math.RandomAccessSparseVector.dot(RandomAccessSparseVector.java:192)
>>> >    at
>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.scale(TimesSquaredJob.java:241)
>>> >    at
>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:232)
>>> >    at
>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
>>> >    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>> >    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>>> >    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>> >    at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>> >
>>> >
>>> >
>>> > On 04/29/2011 10:42 AM, Paul Mahon wrote:
>>> >> The task logs don't have anything useable:
>>> >> *_syslog logs_*
>>> >>
>>> >> 2011-04-28 15:29:17,922 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
>>> Initializing JVM Metrics with processName=MAP, sessionId=
>>> >> 2011-04-28 15:29:18,114 INFO org.apache.hadoop.mapred.MapTask:
>>> numReduceTasks: 50
>>> >> 2011-04-28 15:29:18,124 INFO org.apache.hadoop.mapred.MapTask:
>>> io.sort.mb = 100
>>> >> 2011-04-28 15:29:18,266 INFO org.apache.hadoop.mapred.MapTask: data
>>> buffer = 79691776/99614720
>>> >> 2011-04-28 15:29:18,266 INFO org.apache.hadoop.mapred.MapTask: record
>>> buffer = 262144/327680
>>> >>
>>> >> The driver log isn't very interesting either:
>>> >> 11/04/28 14:26:45 INFO mapred.JobClient: Job complete:
>>> job_201104221106_0128
>>> >> 11/04/28 14:26:45 INFO mapred.JobClient: Counters: 19
>>> >> 11/04/28 14:26:45 INFO mapred.JobClient:   Job Counters
>>> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Launched reduce tasks=50
>>> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Rack-local map tasks=8
>>> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Launched map tasks=10
>>> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Data-local map tasks=2
>>> >> 11/04/28 14:26:45 INFO mapred.JobClient:   FileSystemCounters
>>> >> 11/04/28 14:26:45 INFO mapred.JobClient:     FILE_BYTES_READ=201387210
>>> >> 11/04/28 14:26:45 INFO mapred.JobClient:     HDFS_BYTES_READ=855491954
>>> >> 11/04/28 14:26:45 INFO mapred.JobClient:
>>> FILE_BYTES_WRITTEN=402789200
>>> >> 11/04/28 14:26:45 INFO mapred.JobClient:
>>> HDFS_BYTES_WRITTEN=20143593
>>> >> 11/04/28 14:26:45 INFO mapred.JobClient:   Map-Reduce Framework
>>> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce input groups=1
>>> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Combine output records=10
>>> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Map input
>>> records=12259359
>>> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce shuffle
>>> bytes=181251087
>>> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce output records=1
>>> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Spilled Records=20
>>> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Map output
>>> bytes=201386850
>>> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Map input bytes=654091926
>>> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Combine input records=10
>>> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Map output records=10
>>> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce input records=10
>>> >> 11/04/28 14:26:45 INFO lanczos.LanczosSolver: 9 passes through the
>>> corpus so far...
>>> >> 11/04/28 14:26:53 INFO lanczos.LanczosSolver: Lanczos iteration
>>> complete - now to diagonalize the tri-diagonal auxiliary matrix.
>>> >> 11/04/28 14:26:58 INFO lanczos.LanczosSolver: Eigenvector 0 found with
>>> eigenvalue 0.0
>>> >> 11/04/28 14:27:02 INFO lanczos.LanczosSolver: Eigenvector 1 found with
>>> eigenvalue 2.7694996270467187
>>> >> 11/04/28 14:27:06 INFO lanczos.LanczosSolver: Eigenvector 2 found with
>>> eigenvalue 124.51350651732557
>>> >> 11/04/28 14:27:10 INFO lanczos.LanczosSolver: Eigenvector 3 found with
>>> eigenvalue 254.94694717051294
>>> >> 11/04/28 14:27:15 INFO lanczos.LanczosSolver: Eigenvector 4 found with
>>> eigenvalue 387.54002942185565
>>> >> 11/04/28 14:27:19 INFO lanczos.LanczosSolver: Eigenvector 5 found with
>>> eigenvalue 501.70725582316834
>>> >> 11/04/28 14:27:23 INFO lanczos.LanczosSolver: Eigenvector 6 found with
>>> eigenvalue 614.8114660750927
>>> >> 11/04/28 14:27:28 INFO lanczos.LanczosSolver: Eigenvector 7 found with
>>> eigenvalue 700.1694351007611
>>> >> 11/04/28 14:27:32 INFO lanczos.LanczosSolver: Eigenvector 8 found with
>>> eigenvalue 775.2520850269659
>>> >> 11/04/28 14:27:36 INFO lanczos.LanczosSolver: Eigenvector 9 found with
>>> eigenvalue 39441.64442962535
>>> >> 11/04/28 14:27:36 INFO lanczos.LanczosSolver: LanczosSolver finished.
>>> >> 11/04/28 14:27:36 INFO decomposer.DistributedLanczosSolver: Persisting
>>> 10 eigenVectors and eigenValues to: ./svd/rawEigenvectors
>>> >> 11/04/28 14:29:11 INFO mapred.FileInputFormat: Total input paths to
>>> process : 1
>>> >> 11/04/28 14:29:12 INFO mapred.JobClient: Running job:
>>> job_201104221106_0129
>>> >> 11/04/28 14:29:13 INFO mapred.JobClient:  map 0% reduce 0%
>>> >> 11/04/28 16:08:23 INFO mapred.JobClient:  map 1% reduce 0%
>>> >> ...
>>> >> 11/04/29 05:32:24 INFO mapred.JobClient:  map 9% reduce 0%
>>> >> 11/04/29 07:18:38 INFO mapred.JobClient:  map 10% reduce 0%
>>> >>
>>> >>
>>> >> The counters don't mean much to me either. Here's one set from an
>>> incomplete mapper:
>>> >> *FileSystemCounters*
>>> >>
>>> >>    HDFS_BYTES_READ     24,540,497
>>> >>
>>> >> *Map-Reduce Framework*
>>> >>
>>> >>    Combine output records     0
>>> >>
>>> >>    Map input records     115,815
>>> >>
>>> >>    Spilled Records     0
>>> >>
>>> >>    Map output bytes     0
>>> >>
>>> >>    Map input bytes     4,401,563
>>> >>
>>> >>    Combine input records     0
>>> >>
>>> >>    Map output records     0
>>> >>
>>> >>
>>> >>
>>> >> On 04/29/2011 10:28 AM, Jake Mannix wrote:
>>> >>> On Fri, Apr 29, 2011 at 9:45 AM, Paul Mahon<pm...@decarta.com>
>>>  wrote:
>>> >>>
>>> >>>> I'm running the LanczosSolver (from SVN two days ago) it it's taking
>>> (what
>>> >>>> I think) is a very long time to run. The matrix I'm using is very
>>> sparse.
>>> >>>> The command I used was:
>>> >>>>
>>> >>>> bin/mahour svd --input ./base --output ./svd --numRows 12259359
>>> --numCols
>>> >>>> 2517335 --rank 10 --cleansvd true -Dmapred.child.java.opts=-Xmx2048m
>>> >>>>
>>> >>>> The first part of the program runs pretty quickly, and in less than
>>> 10
>>> >>>> minutes it has found 10 eigenvectors. The second step (cleansvd) has
>>> been
>>> >>>> running for 15 hours and is only 10% of the way through the map
>>> tasks.
>>> >>>>
>>> >>> 10 minutes for 10 eigenvectors is totally reasonable, depending on
>>> how
>>> >>> sparse this matrix is.  cleansvd should take roughly the *same*
>>> amount of
>>> >>> time as the original solving, because it's basically just taking each
>>> of the
>>> >>> vectors and re-multiplying them by the original matrix to see how
>>> close to
>>> >>> actual eigenvectors they are.  Do you have any log output to share?
>>>  I can
>>> >>> try to reproduce this on a similarly sized matrix, but this defintely
>>> seems
>>> >>> wrong.
>>> >>>
>>> >>>
>>> >>>> Is this performance expected? Is there something I can do to speed
>>> it up?
>>> >>>>
>>> >>> No, and probably not.  Other than help me find the bug that's causing
>>> this.
>>> >>>  I've never seen that behavior before.
>>> >>>
>>> >>>   -jake
>>> >>>
>>> >>
>>>
>>> --------------------------
>>> Ken Krugler
>>> +1 530-210-6378
>>> http://bixolabs.com
>>> e l a s t i c   w e b   m i n i n g
>>>
>>>
>>>
>>>
>>>
>>>
>>
>

Re: LanczosSolver Very Slow

Posted by Jake Mannix <ja...@gmail.com>.
Ok, I think I've found it.  It was this issue, and another similar one.
 Going to verify in a bit, and patch
coming later this afternoon!

  -jake

On Fri, Apr 29, 2011 at 1:42 PM, Jake Mannix <ja...@gmail.com> wrote:

> I think this is actually a bug (or series of bugs) somewhere, not a
> configuration issue, as shown by the fact
> that LanczosSolver finishes just fine in 10 min.
>
> I'm looking into it.  I've already found one problem (TimesSquaredJob
> checks instanceOf DenseVector instead
> of calling isDense(), which gets the wrong answer for NamedVector
> instances), seeing if this is the slowdown
> cause now.
>
>   -jake
>
> On Fri, Apr 29, 2011 at 1:08 PM, Ken Krugler <kk...@transpac.com>wrote:
>
>> Hi Paul,
>>
>> One random guess here - the total in-use memory for all tasks means you've
>> run out of physical memory, and are using swap.
>>
>> E.g. the use of JVM memory by hash maps is one way this can happen.
>>
>> Once that occurs, your performance will become effectively 0, as tasks
>> compete with each other for memory.
>>
>> To see if this is happening, run 'top' on a slave and see how much swap is
>> being used.
>>
>> -- Ken
>>
>> On Apr 29, 2011, at 11:49am, Paul Mahon wrote:
>>
>> > I did dump the stack a few times. Here are a couple which came up most
>> often:
>> > "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
>> >   java.lang.Thread.State: RUNNABLE
>> >    at
>> org.apache.mahout.math.map.OpenIntDoubleHashMap.indexOfKey(OpenIntDoubleHashMap.java:285)
>> >    at
>> org.apache.mahout.math.map.OpenIntDoubleHashMap.get(OpenIntDoubleHashMap.java:202)
>> >    at
>> org.apache.mahout.math.RandomAccessSparseVector.getQuick(RandomAccessSparseVector.java:129)
>> >    at
>> org.apache.mahout.math.AbstractVector.assign(AbstractVector.java:527)
>> >    at
>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:236)
>> >    at
>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
>> >    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>> >    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>> >    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>> >    at org.apache.hadoop.mapred.Child.main(Child.java:170)
>> >
>> > "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
>> >   java.lang.Thread.State: RUNNABLE
>> >    at
>> org.apache.mahout.math.AbstractVector.assign(AbstractVector.java:526)
>> >    at
>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:236)
>> >    at
>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
>> >    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>> >    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>> >    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>> >    at org.apache.hadoop.mapred.Child.main(Child.java:170)
>> >
>> > "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
>> >   java.lang.Thread.State: RUNNABLE
>> >    at
>> org.apache.mahout.math.RandomAccessSparseVector.dot(RandomAccessSparseVector.java:192)
>> >    at
>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.scale(TimesSquaredJob.java:241)
>> >    at
>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:232)
>> >    at
>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
>> >    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>> >    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>> >    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>> >    at org.apache.hadoop.mapred.Child.main(Child.java:170)
>> >
>> >
>> >
>> > On 04/29/2011 10:42 AM, Paul Mahon wrote:
>> >> The task logs don't have anything useable:
>> >> *_syslog logs_*
>> >>
>> >> 2011-04-28 15:29:17,922 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
>> Initializing JVM Metrics with processName=MAP, sessionId=
>> >> 2011-04-28 15:29:18,114 INFO org.apache.hadoop.mapred.MapTask:
>> numReduceTasks: 50
>> >> 2011-04-28 15:29:18,124 INFO org.apache.hadoop.mapred.MapTask:
>> io.sort.mb = 100
>> >> 2011-04-28 15:29:18,266 INFO org.apache.hadoop.mapred.MapTask: data
>> buffer = 79691776/99614720
>> >> 2011-04-28 15:29:18,266 INFO org.apache.hadoop.mapred.MapTask: record
>> buffer = 262144/327680
>> >>
>> >> The driver log isn't very interesting either:
>> >> 11/04/28 14:26:45 INFO mapred.JobClient: Job complete:
>> job_201104221106_0128
>> >> 11/04/28 14:26:45 INFO mapred.JobClient: Counters: 19
>> >> 11/04/28 14:26:45 INFO mapred.JobClient:   Job Counters
>> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Launched reduce tasks=50
>> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Rack-local map tasks=8
>> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Launched map tasks=10
>> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Data-local map tasks=2
>> >> 11/04/28 14:26:45 INFO mapred.JobClient:   FileSystemCounters
>> >> 11/04/28 14:26:45 INFO mapred.JobClient:     FILE_BYTES_READ=201387210
>> >> 11/04/28 14:26:45 INFO mapred.JobClient:     HDFS_BYTES_READ=855491954
>> >> 11/04/28 14:26:45 INFO mapred.JobClient:
>> FILE_BYTES_WRITTEN=402789200
>> >> 11/04/28 14:26:45 INFO mapred.JobClient:
>> HDFS_BYTES_WRITTEN=20143593
>> >> 11/04/28 14:26:45 INFO mapred.JobClient:   Map-Reduce Framework
>> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce input groups=1
>> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Combine output records=10
>> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Map input records=12259359
>> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce shuffle
>> bytes=181251087
>> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce output records=1
>> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Spilled Records=20
>> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Map output bytes=201386850
>> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Map input bytes=654091926
>> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Combine input records=10
>> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Map output records=10
>> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce input records=10
>> >> 11/04/28 14:26:45 INFO lanczos.LanczosSolver: 9 passes through the
>> corpus so far...
>> >> 11/04/28 14:26:53 INFO lanczos.LanczosSolver: Lanczos iteration
>> complete - now to diagonalize the tri-diagonal auxiliary matrix.
>> >> 11/04/28 14:26:58 INFO lanczos.LanczosSolver: Eigenvector 0 found with
>> eigenvalue 0.0
>> >> 11/04/28 14:27:02 INFO lanczos.LanczosSolver: Eigenvector 1 found with
>> eigenvalue 2.7694996270467187
>> >> 11/04/28 14:27:06 INFO lanczos.LanczosSolver: Eigenvector 2 found with
>> eigenvalue 124.51350651732557
>> >> 11/04/28 14:27:10 INFO lanczos.LanczosSolver: Eigenvector 3 found with
>> eigenvalue 254.94694717051294
>> >> 11/04/28 14:27:15 INFO lanczos.LanczosSolver: Eigenvector 4 found with
>> eigenvalue 387.54002942185565
>> >> 11/04/28 14:27:19 INFO lanczos.LanczosSolver: Eigenvector 5 found with
>> eigenvalue 501.70725582316834
>> >> 11/04/28 14:27:23 INFO lanczos.LanczosSolver: Eigenvector 6 found with
>> eigenvalue 614.8114660750927
>> >> 11/04/28 14:27:28 INFO lanczos.LanczosSolver: Eigenvector 7 found with
>> eigenvalue 700.1694351007611
>> >> 11/04/28 14:27:32 INFO lanczos.LanczosSolver: Eigenvector 8 found with
>> eigenvalue 775.2520850269659
>> >> 11/04/28 14:27:36 INFO lanczos.LanczosSolver: Eigenvector 9 found with
>> eigenvalue 39441.64442962535
>> >> 11/04/28 14:27:36 INFO lanczos.LanczosSolver: LanczosSolver finished.
>> >> 11/04/28 14:27:36 INFO decomposer.DistributedLanczosSolver: Persisting
>> 10 eigenVectors and eigenValues to: ./svd/rawEigenvectors
>> >> 11/04/28 14:29:11 INFO mapred.FileInputFormat: Total input paths to
>> process : 1
>> >> 11/04/28 14:29:12 INFO mapred.JobClient: Running job:
>> job_201104221106_0129
>> >> 11/04/28 14:29:13 INFO mapred.JobClient:  map 0% reduce 0%
>> >> 11/04/28 16:08:23 INFO mapred.JobClient:  map 1% reduce 0%
>> >> ...
>> >> 11/04/29 05:32:24 INFO mapred.JobClient:  map 9% reduce 0%
>> >> 11/04/29 07:18:38 INFO mapred.JobClient:  map 10% reduce 0%
>> >>
>> >>
>> >> The counters don't mean much to me either. Here's one set from an
>> incomplete mapper:
>> >> *FileSystemCounters*
>> >>
>> >>    HDFS_BYTES_READ     24,540,497
>> >>
>> >> *Map-Reduce Framework*
>> >>
>> >>    Combine output records     0
>> >>
>> >>    Map input records     115,815
>> >>
>> >>    Spilled Records     0
>> >>
>> >>    Map output bytes     0
>> >>
>> >>    Map input bytes     4,401,563
>> >>
>> >>    Combine input records     0
>> >>
>> >>    Map output records     0
>> >>
>> >>
>> >>
>> >> On 04/29/2011 10:28 AM, Jake Mannix wrote:
>> >>> On Fri, Apr 29, 2011 at 9:45 AM, Paul Mahon<pm...@decarta.com>
>>  wrote:
>> >>>
>> >>>> I'm running the LanczosSolver (from SVN two days ago) it it's taking
>> (what
>> >>>> I think) is a very long time to run. The matrix I'm using is very
>> sparse.
>> >>>> The command I used was:
>> >>>>
>> >>>> bin/mahour svd --input ./base --output ./svd --numRows 12259359
>> --numCols
>> >>>> 2517335 --rank 10 --cleansvd true -Dmapred.child.java.opts=-Xmx2048m
>> >>>>
>> >>>> The first part of the program runs pretty quickly, and in less than
>> 10
>> >>>> minutes it has found 10 eigenvectors. The second step (cleansvd) has
>> been
>> >>>> running for 15 hours and is only 10% of the way through the map
>> tasks.
>> >>>>
>> >>> 10 minutes for 10 eigenvectors is totally reasonable, depending on how
>> >>> sparse this matrix is.  cleansvd should take roughly the *same* amount
>> of
>> >>> time as the original solving, because it's basically just taking each
>> of the
>> >>> vectors and re-multiplying them by the original matrix to see how
>> close to
>> >>> actual eigenvectors they are.  Do you have any log output to share?  I
>> can
>> >>> try to reproduce this on a similarly sized matrix, but this defintely
>> seems
>> >>> wrong.
>> >>>
>> >>>
>> >>>> Is this performance expected? Is there something I can do to speed it
>> up?
>> >>>>
>> >>> No, and probably not.  Other than help me find the bug that's causing
>> this.
>> >>>  I've never seen that behavior before.
>> >>>
>> >>>   -jake
>> >>>
>> >>
>>
>> --------------------------
>> Ken Krugler
>> +1 530-210-6378
>> http://bixolabs.com
>> e l a s t i c   w e b   m i n i n g
>>
>>
>>
>>
>>
>>
>

Re: LanczosSolver Very Slow

Posted by Jake Mannix <ja...@gmail.com>.
I think this is actually a bug (or series of bugs) somewhere, not a
configuration issue, as shown by the fact
that LanczosSolver finishes just fine in 10 min.

I'm looking into it.  I've already found one problem (TimesSquaredJob checks
instanceOf DenseVector instead
of calling isDense(), which gets the wrong answer for NamedVector
instances), seeing if this is the slowdown
cause now.

  -jake

On Fri, Apr 29, 2011 at 1:08 PM, Ken Krugler <kk...@transpac.com>wrote:

> Hi Paul,
>
> One random guess here - the total in-use memory for all tasks means you've
> run out of physical memory, and are using swap.
>
> E.g. the use of JVM memory by hash maps is one way this can happen.
>
> Once that occurs, your performance will become effectively 0, as tasks
> compete with each other for memory.
>
> To see if this is happening, run 'top' on a slave and see how much swap is
> being used.
>
> -- Ken
>
> On Apr 29, 2011, at 11:49am, Paul Mahon wrote:
>
> > I did dump the stack a few times. Here are a couple which came up most
> often:
> > "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
> >   java.lang.Thread.State: RUNNABLE
> >    at
> org.apache.mahout.math.map.OpenIntDoubleHashMap.indexOfKey(OpenIntDoubleHashMap.java:285)
> >    at
> org.apache.mahout.math.map.OpenIntDoubleHashMap.get(OpenIntDoubleHashMap.java:202)
> >    at
> org.apache.mahout.math.RandomAccessSparseVector.getQuick(RandomAccessSparseVector.java:129)
> >    at
> org.apache.mahout.math.AbstractVector.assign(AbstractVector.java:527)
> >    at
> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:236)
> >    at
> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
> >    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> >    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> >    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> >    at org.apache.hadoop.mapred.Child.main(Child.java:170)
> >
> > "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
> >   java.lang.Thread.State: RUNNABLE
> >    at
> org.apache.mahout.math.AbstractVector.assign(AbstractVector.java:526)
> >    at
> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:236)
> >    at
> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
> >    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> >    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> >    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> >    at org.apache.hadoop.mapred.Child.main(Child.java:170)
> >
> > "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
> >   java.lang.Thread.State: RUNNABLE
> >    at
> org.apache.mahout.math.RandomAccessSparseVector.dot(RandomAccessSparseVector.java:192)
> >    at
> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.scale(TimesSquaredJob.java:241)
> >    at
> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:232)
> >    at
> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
> >    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> >    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> >    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> >    at org.apache.hadoop.mapred.Child.main(Child.java:170)
> >
> >
> >
> > On 04/29/2011 10:42 AM, Paul Mahon wrote:
> >> The task logs don't have anything useable:
> >> *_syslog logs_*
> >>
> >> 2011-04-28 15:29:17,922 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> Initializing JVM Metrics with processName=MAP, sessionId=
> >> 2011-04-28 15:29:18,114 INFO org.apache.hadoop.mapred.MapTask:
> numReduceTasks: 50
> >> 2011-04-28 15:29:18,124 INFO org.apache.hadoop.mapred.MapTask:
> io.sort.mb = 100
> >> 2011-04-28 15:29:18,266 INFO org.apache.hadoop.mapred.MapTask: data
> buffer = 79691776/99614720
> >> 2011-04-28 15:29:18,266 INFO org.apache.hadoop.mapred.MapTask: record
> buffer = 262144/327680
> >>
> >> The driver log isn't very interesting either:
> >> 11/04/28 14:26:45 INFO mapred.JobClient: Job complete:
> job_201104221106_0128
> >> 11/04/28 14:26:45 INFO mapred.JobClient: Counters: 19
> >> 11/04/28 14:26:45 INFO mapred.JobClient:   Job Counters
> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Launched reduce tasks=50
> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Rack-local map tasks=8
> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Launched map tasks=10
> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Data-local map tasks=2
> >> 11/04/28 14:26:45 INFO mapred.JobClient:   FileSystemCounters
> >> 11/04/28 14:26:45 INFO mapred.JobClient:     FILE_BYTES_READ=201387210
> >> 11/04/28 14:26:45 INFO mapred.JobClient:     HDFS_BYTES_READ=855491954
> >> 11/04/28 14:26:45 INFO mapred.JobClient:
> FILE_BYTES_WRITTEN=402789200
> >> 11/04/28 14:26:45 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=20143593
> >> 11/04/28 14:26:45 INFO mapred.JobClient:   Map-Reduce Framework
> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce input groups=1
> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Combine output records=10
> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Map input records=12259359
> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce shuffle
> bytes=181251087
> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce output records=1
> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Spilled Records=20
> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Map output bytes=201386850
> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Map input bytes=654091926
> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Combine input records=10
> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Map output records=10
> >> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce input records=10
> >> 11/04/28 14:26:45 INFO lanczos.LanczosSolver: 9 passes through the
> corpus so far...
> >> 11/04/28 14:26:53 INFO lanczos.LanczosSolver: Lanczos iteration complete
> - now to diagonalize the tri-diagonal auxiliary matrix.
> >> 11/04/28 14:26:58 INFO lanczos.LanczosSolver: Eigenvector 0 found with
> eigenvalue 0.0
> >> 11/04/28 14:27:02 INFO lanczos.LanczosSolver: Eigenvector 1 found with
> eigenvalue 2.7694996270467187
> >> 11/04/28 14:27:06 INFO lanczos.LanczosSolver: Eigenvector 2 found with
> eigenvalue 124.51350651732557
> >> 11/04/28 14:27:10 INFO lanczos.LanczosSolver: Eigenvector 3 found with
> eigenvalue 254.94694717051294
> >> 11/04/28 14:27:15 INFO lanczos.LanczosSolver: Eigenvector 4 found with
> eigenvalue 387.54002942185565
> >> 11/04/28 14:27:19 INFO lanczos.LanczosSolver: Eigenvector 5 found with
> eigenvalue 501.70725582316834
> >> 11/04/28 14:27:23 INFO lanczos.LanczosSolver: Eigenvector 6 found with
> eigenvalue 614.8114660750927
> >> 11/04/28 14:27:28 INFO lanczos.LanczosSolver: Eigenvector 7 found with
> eigenvalue 700.1694351007611
> >> 11/04/28 14:27:32 INFO lanczos.LanczosSolver: Eigenvector 8 found with
> eigenvalue 775.2520850269659
> >> 11/04/28 14:27:36 INFO lanczos.LanczosSolver: Eigenvector 9 found with
> eigenvalue 39441.64442962535
> >> 11/04/28 14:27:36 INFO lanczos.LanczosSolver: LanczosSolver finished.
> >> 11/04/28 14:27:36 INFO decomposer.DistributedLanczosSolver: Persisting
> 10 eigenVectors and eigenValues to: ./svd/rawEigenvectors
> >> 11/04/28 14:29:11 INFO mapred.FileInputFormat: Total input paths to
> process : 1
> >> 11/04/28 14:29:12 INFO mapred.JobClient: Running job:
> job_201104221106_0129
> >> 11/04/28 14:29:13 INFO mapred.JobClient:  map 0% reduce 0%
> >> 11/04/28 16:08:23 INFO mapred.JobClient:  map 1% reduce 0%
> >> ...
> >> 11/04/29 05:32:24 INFO mapred.JobClient:  map 9% reduce 0%
> >> 11/04/29 07:18:38 INFO mapred.JobClient:  map 10% reduce 0%
> >>
> >>
> >> The counters don't mean much to me either. Here's one set from an
> incomplete mapper:
> >> *FileSystemCounters*
> >>
> >>    HDFS_BYTES_READ     24,540,497
> >>
> >> *Map-Reduce Framework*
> >>
> >>    Combine output records     0
> >>
> >>    Map input records     115,815
> >>
> >>    Spilled Records     0
> >>
> >>    Map output bytes     0
> >>
> >>    Map input bytes     4,401,563
> >>
> >>    Combine input records     0
> >>
> >>    Map output records     0
> >>
> >>
> >>
> >> On 04/29/2011 10:28 AM, Jake Mannix wrote:
> >>> On Fri, Apr 29, 2011 at 9:45 AM, Paul Mahon<pm...@decarta.com>
>  wrote:
> >>>
> >>>> I'm running the LanczosSolver (from SVN two days ago) it it's taking
> (what
> >>>> I think) is a very long time to run. The matrix I'm using is very
> sparse.
> >>>> The command I used was:
> >>>>
> >>>> bin/mahour svd --input ./base --output ./svd --numRows 12259359
> --numCols
> >>>> 2517335 --rank 10 --cleansvd true -Dmapred.child.java.opts=-Xmx2048m
> >>>>
> >>>> The first part of the program runs pretty quickly, and in less than 10
> >>>> minutes it has found 10 eigenvectors. The second step (cleansvd) has
> been
> >>>> running for 15 hours and is only 10% of the way through the map tasks.
> >>>>
> >>> 10 minutes for 10 eigenvectors is totally reasonable, depending on how
> >>> sparse this matrix is.  cleansvd should take roughly the *same* amount
> of
> >>> time as the original solving, because it's basically just taking each
> of the
> >>> vectors and re-multiplying them by the original matrix to see how close
> to
> >>> actual eigenvectors they are.  Do you have any log output to share?  I
> can
> >>> try to reproduce this on a similarly sized matrix, but this defintely
> seems
> >>> wrong.
> >>>
> >>>
> >>>> Is this performance expected? Is there something I can do to speed it
> up?
> >>>>
> >>> No, and probably not.  Other than help me find the bug that's causing
> this.
> >>>  I've never seen that behavior before.
> >>>
> >>>   -jake
> >>>
> >>
>
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com
> e l a s t i c   w e b   m i n i n g
>
>
>
>
>
>

Re: LanczosSolver Very Slow

Posted by Paul Mahon <pm...@decarta.com>.
A good thought, but memory use is good, less than 15% on the machines 
involved.

On 04/29/2011 01:08 PM, Ken Krugler wrote:
> Hi Paul,
>
> One random guess here - the total in-use memory for all tasks means you've run out of physical memory, and are using swap.
>
> E.g. the use of JVM memory by hash maps is one way this can happen.
>
> Once that occurs, your performance will become effectively 0, as tasks compete with each other for memory.
>
> To see if this is happening, run 'top' on a slave and see how much swap is being used.
>
> -- Ken
>
> On Apr 29, 2011, at 11:49am, Paul Mahon wrote:
>
>> I did dump the stack a few times. Here are a couple which came up most often:
>> "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
>>    java.lang.Thread.State: RUNNABLE
>>     at org.apache.mahout.math.map.OpenIntDoubleHashMap.indexOfKey(OpenIntDoubleHashMap.java:285)
>>     at org.apache.mahout.math.map.OpenIntDoubleHashMap.get(OpenIntDoubleHashMap.java:202)
>>     at org.apache.mahout.math.RandomAccessSparseVector.getQuick(RandomAccessSparseVector.java:129)
>>     at org.apache.mahout.math.AbstractVector.assign(AbstractVector.java:527)
>>     at org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:236)
>>     at org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
>>     at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
>>    java.lang.Thread.State: RUNNABLE
>>     at org.apache.mahout.math.AbstractVector.assign(AbstractVector.java:526)
>>     at org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:236)
>>     at org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
>>     at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
>>    java.lang.Thread.State: RUNNABLE
>>     at org.apache.mahout.math.RandomAccessSparseVector.dot(RandomAccessSparseVector.java:192)
>>     at org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.scale(TimesSquaredJob.java:241)
>>     at org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:232)
>>     at org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
>>     at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>>
>>
>> On 04/29/2011 10:42 AM, Paul Mahon wrote:
>>> The task logs don't have anything useable:
>>> *_syslog logs_*
>>>
>>> 2011-04-28 15:29:17,922 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=
>>> 2011-04-28 15:29:18,114 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 50
>>> 2011-04-28 15:29:18,124 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 100
>>> 2011-04-28 15:29:18,266 INFO org.apache.hadoop.mapred.MapTask: data buffer = 79691776/99614720
>>> 2011-04-28 15:29:18,266 INFO org.apache.hadoop.mapred.MapTask: record buffer = 262144/327680
>>>
>>> The driver log isn't very interesting either:
>>> 11/04/28 14:26:45 INFO mapred.JobClient: Job complete: job_201104221106_0128
>>> 11/04/28 14:26:45 INFO mapred.JobClient: Counters: 19
>>> 11/04/28 14:26:45 INFO mapred.JobClient:   Job Counters
>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Launched reduce tasks=50
>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Rack-local map tasks=8
>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Launched map tasks=10
>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Data-local map tasks=2
>>> 11/04/28 14:26:45 INFO mapred.JobClient:   FileSystemCounters
>>> 11/04/28 14:26:45 INFO mapred.JobClient:     FILE_BYTES_READ=201387210
>>> 11/04/28 14:26:45 INFO mapred.JobClient:     HDFS_BYTES_READ=855491954
>>> 11/04/28 14:26:45 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=402789200
>>> 11/04/28 14:26:45 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=20143593
>>> 11/04/28 14:26:45 INFO mapred.JobClient:   Map-Reduce Framework
>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce input groups=1
>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Combine output records=10
>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map input records=12259359
>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce shuffle bytes=181251087
>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce output records=1
>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Spilled Records=20
>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map output bytes=201386850
>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map input bytes=654091926
>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Combine input records=10
>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map output records=10
>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce input records=10
>>> 11/04/28 14:26:45 INFO lanczos.LanczosSolver: 9 passes through the corpus so far...
>>> 11/04/28 14:26:53 INFO lanczos.LanczosSolver: Lanczos iteration complete - now to diagonalize the tri-diagonal auxiliary matrix.
>>> 11/04/28 14:26:58 INFO lanczos.LanczosSolver: Eigenvector 0 found with eigenvalue 0.0
>>> 11/04/28 14:27:02 INFO lanczos.LanczosSolver: Eigenvector 1 found with eigenvalue 2.7694996270467187
>>> 11/04/28 14:27:06 INFO lanczos.LanczosSolver: Eigenvector 2 found with eigenvalue 124.51350651732557
>>> 11/04/28 14:27:10 INFO lanczos.LanczosSolver: Eigenvector 3 found with eigenvalue 254.94694717051294
>>> 11/04/28 14:27:15 INFO lanczos.LanczosSolver: Eigenvector 4 found with eigenvalue 387.54002942185565
>>> 11/04/28 14:27:19 INFO lanczos.LanczosSolver: Eigenvector 5 found with eigenvalue 501.70725582316834
>>> 11/04/28 14:27:23 INFO lanczos.LanczosSolver: Eigenvector 6 found with eigenvalue 614.8114660750927
>>> 11/04/28 14:27:28 INFO lanczos.LanczosSolver: Eigenvector 7 found with eigenvalue 700.1694351007611
>>> 11/04/28 14:27:32 INFO lanczos.LanczosSolver: Eigenvector 8 found with eigenvalue 775.2520850269659
>>> 11/04/28 14:27:36 INFO lanczos.LanczosSolver: Eigenvector 9 found with eigenvalue 39441.64442962535
>>> 11/04/28 14:27:36 INFO lanczos.LanczosSolver: LanczosSolver finished.
>>> 11/04/28 14:27:36 INFO decomposer.DistributedLanczosSolver: Persisting 10 eigenVectors and eigenValues to: ./svd/rawEigenvectors
>>> 11/04/28 14:29:11 INFO mapred.FileInputFormat: Total input paths to process : 1
>>> 11/04/28 14:29:12 INFO mapred.JobClient: Running job: job_201104221106_0129
>>> 11/04/28 14:29:13 INFO mapred.JobClient:  map 0% reduce 0%
>>> 11/04/28 16:08:23 INFO mapred.JobClient:  map 1% reduce 0%
>>> ...
>>> 11/04/29 05:32:24 INFO mapred.JobClient:  map 9% reduce 0%
>>> 11/04/29 07:18:38 INFO mapred.JobClient:  map 10% reduce 0%
>>>
>>>
>>> The counters don't mean much to me either. Here's one set from an incomplete mapper:
>>> *FileSystemCounters*
>>>
>>>     HDFS_BYTES_READ     24,540,497
>>>
>>> *Map-Reduce Framework*
>>>
>>>     Combine output records     0
>>>
>>>     Map input records     115,815
>>>
>>>     Spilled Records     0
>>>
>>>     Map output bytes     0
>>>
>>>     Map input bytes     4,401,563
>>>
>>>     Combine input records     0
>>>
>>>     Map output records     0
>>>
>>>
>>>
>>> On 04/29/2011 10:28 AM, Jake Mannix wrote:
>>>> On Fri, Apr 29, 2011 at 9:45 AM, Paul Mahon<pm...@decarta.com>   wrote:
>>>>
>>>>> I'm running the LanczosSolver (from SVN two days ago) it it's taking (what
>>>>> I think) is a very long time to run. The matrix I'm using is very sparse.
>>>>> The command I used was:
>>>>>
>>>>> bin/mahour svd --input ./base --output ./svd --numRows 12259359 --numCols
>>>>> 2517335 --rank 10 --cleansvd true -Dmapred.child.java.opts=-Xmx2048m
>>>>>
>>>>> The first part of the program runs pretty quickly, and in less than 10
>>>>> minutes it has found 10 eigenvectors. The second step (cleansvd) has been
>>>>> running for 15 hours and is only 10% of the way through the map tasks.
>>>>>
>>>> 10 minutes for 10 eigenvectors is totally reasonable, depending on how
>>>> sparse this matrix is.  cleansvd should take roughly the *same* amount of
>>>> time as the original solving, because it's basically just taking each of the
>>>> vectors and re-multiplying them by the original matrix to see how close to
>>>> actual eigenvectors they are.  Do you have any log output to share?  I can
>>>> try to reproduce this on a similarly sized matrix, but this defintely seems
>>>> wrong.
>>>>
>>>>
>>>>> Is this performance expected? Is there something I can do to speed it up?
>>>>>
>>>> No, and probably not.  Other than help me find the bug that's causing this.
>>>>   I've never seen that behavior before.
>>>>
>>>>    -jake
>>>>
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com
> e l a s t i c   w e b   m i n i n g
>
>
>
>
>

Re: LanczosSolver Very Slow

Posted by Ken Krugler <kk...@transpac.com>.
Hi Paul,

One random guess here - the total in-use memory for all tasks means you've run out of physical memory, and are using swap.

E.g. the use of JVM memory by hash maps is one way this can happen.

Once that occurs, your performance will become effectively 0, as tasks compete with each other for memory.

To see if this is happening, run 'top' on a slave and see how much swap is being used.

-- Ken

On Apr 29, 2011, at 11:49am, Paul Mahon wrote:

> I did dump the stack a few times. Here are a couple which came up most often:
> "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
>   java.lang.Thread.State: RUNNABLE
>    at org.apache.mahout.math.map.OpenIntDoubleHashMap.indexOfKey(OpenIntDoubleHashMap.java:285)
>    at org.apache.mahout.math.map.OpenIntDoubleHashMap.get(OpenIntDoubleHashMap.java:202)
>    at org.apache.mahout.math.RandomAccessSparseVector.getQuick(RandomAccessSparseVector.java:129)
>    at org.apache.mahout.math.AbstractVector.assign(AbstractVector.java:527)
>    at org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:236)
>    at org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
>    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>    at org.apache.hadoop.mapred.Child.main(Child.java:170)
> 
> "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
>   java.lang.Thread.State: RUNNABLE
>    at org.apache.mahout.math.AbstractVector.assign(AbstractVector.java:526)
>    at org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:236)
>    at org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
>    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>    at org.apache.hadoop.mapred.Child.main(Child.java:170)
> 
> "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
>   java.lang.Thread.State: RUNNABLE
>    at org.apache.mahout.math.RandomAccessSparseVector.dot(RandomAccessSparseVector.java:192)
>    at org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.scale(TimesSquaredJob.java:241)
>    at org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:232)
>    at org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
>    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>    at org.apache.hadoop.mapred.Child.main(Child.java:170)
> 
> 
> 
> On 04/29/2011 10:42 AM, Paul Mahon wrote:
>> The task logs don't have anything useable:
>> *_syslog logs_*
>> 
>> 2011-04-28 15:29:17,922 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=
>> 2011-04-28 15:29:18,114 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 50
>> 2011-04-28 15:29:18,124 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 100
>> 2011-04-28 15:29:18,266 INFO org.apache.hadoop.mapred.MapTask: data buffer = 79691776/99614720
>> 2011-04-28 15:29:18,266 INFO org.apache.hadoop.mapred.MapTask: record buffer = 262144/327680
>> 
>> The driver log isn't very interesting either:
>> 11/04/28 14:26:45 INFO mapred.JobClient: Job complete: job_201104221106_0128
>> 11/04/28 14:26:45 INFO mapred.JobClient: Counters: 19
>> 11/04/28 14:26:45 INFO mapred.JobClient:   Job Counters
>> 11/04/28 14:26:45 INFO mapred.JobClient:     Launched reduce tasks=50
>> 11/04/28 14:26:45 INFO mapred.JobClient:     Rack-local map tasks=8
>> 11/04/28 14:26:45 INFO mapred.JobClient:     Launched map tasks=10
>> 11/04/28 14:26:45 INFO mapred.JobClient:     Data-local map tasks=2
>> 11/04/28 14:26:45 INFO mapred.JobClient:   FileSystemCounters
>> 11/04/28 14:26:45 INFO mapred.JobClient:     FILE_BYTES_READ=201387210
>> 11/04/28 14:26:45 INFO mapred.JobClient:     HDFS_BYTES_READ=855491954
>> 11/04/28 14:26:45 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=402789200
>> 11/04/28 14:26:45 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=20143593
>> 11/04/28 14:26:45 INFO mapred.JobClient:   Map-Reduce Framework
>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce input groups=1
>> 11/04/28 14:26:45 INFO mapred.JobClient:     Combine output records=10
>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map input records=12259359
>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce shuffle bytes=181251087
>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce output records=1
>> 11/04/28 14:26:45 INFO mapred.JobClient:     Spilled Records=20
>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map output bytes=201386850
>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map input bytes=654091926
>> 11/04/28 14:26:45 INFO mapred.JobClient:     Combine input records=10
>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map output records=10
>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce input records=10
>> 11/04/28 14:26:45 INFO lanczos.LanczosSolver: 9 passes through the corpus so far...
>> 11/04/28 14:26:53 INFO lanczos.LanczosSolver: Lanczos iteration complete - now to diagonalize the tri-diagonal auxiliary matrix.
>> 11/04/28 14:26:58 INFO lanczos.LanczosSolver: Eigenvector 0 found with eigenvalue 0.0
>> 11/04/28 14:27:02 INFO lanczos.LanczosSolver: Eigenvector 1 found with eigenvalue 2.7694996270467187
>> 11/04/28 14:27:06 INFO lanczos.LanczosSolver: Eigenvector 2 found with eigenvalue 124.51350651732557
>> 11/04/28 14:27:10 INFO lanczos.LanczosSolver: Eigenvector 3 found with eigenvalue 254.94694717051294
>> 11/04/28 14:27:15 INFO lanczos.LanczosSolver: Eigenvector 4 found with eigenvalue 387.54002942185565
>> 11/04/28 14:27:19 INFO lanczos.LanczosSolver: Eigenvector 5 found with eigenvalue 501.70725582316834
>> 11/04/28 14:27:23 INFO lanczos.LanczosSolver: Eigenvector 6 found with eigenvalue 614.8114660750927
>> 11/04/28 14:27:28 INFO lanczos.LanczosSolver: Eigenvector 7 found with eigenvalue 700.1694351007611
>> 11/04/28 14:27:32 INFO lanczos.LanczosSolver: Eigenvector 8 found with eigenvalue 775.2520850269659
>> 11/04/28 14:27:36 INFO lanczos.LanczosSolver: Eigenvector 9 found with eigenvalue 39441.64442962535
>> 11/04/28 14:27:36 INFO lanczos.LanczosSolver: LanczosSolver finished.
>> 11/04/28 14:27:36 INFO decomposer.DistributedLanczosSolver: Persisting 10 eigenVectors and eigenValues to: ./svd/rawEigenvectors
>> 11/04/28 14:29:11 INFO mapred.FileInputFormat: Total input paths to process : 1
>> 11/04/28 14:29:12 INFO mapred.JobClient: Running job: job_201104221106_0129
>> 11/04/28 14:29:13 INFO mapred.JobClient:  map 0% reduce 0%
>> 11/04/28 16:08:23 INFO mapred.JobClient:  map 1% reduce 0%
>> ...
>> 11/04/29 05:32:24 INFO mapred.JobClient:  map 9% reduce 0%
>> 11/04/29 07:18:38 INFO mapred.JobClient:  map 10% reduce 0%
>> 
>> 
>> The counters don't mean much to me either. Here's one set from an incomplete mapper:
>> *FileSystemCounters*
>> 
>>    HDFS_BYTES_READ     24,540,497
>> 
>> *Map-Reduce Framework*
>> 
>>    Combine output records     0
>> 
>>    Map input records     115,815
>> 
>>    Spilled Records     0
>> 
>>    Map output bytes     0
>> 
>>    Map input bytes     4,401,563
>> 
>>    Combine input records     0
>> 
>>    Map output records     0
>> 
>> 
>> 
>> On 04/29/2011 10:28 AM, Jake Mannix wrote:
>>> On Fri, Apr 29, 2011 at 9:45 AM, Paul Mahon<pm...@decarta.com>  wrote:
>>> 
>>>> I'm running the LanczosSolver (from SVN two days ago) it it's taking (what
>>>> I think) is a very long time to run. The matrix I'm using is very sparse.
>>>> The command I used was:
>>>> 
>>>> bin/mahour svd --input ./base --output ./svd --numRows 12259359 --numCols
>>>> 2517335 --rank 10 --cleansvd true -Dmapred.child.java.opts=-Xmx2048m
>>>> 
>>>> The first part of the program runs pretty quickly, and in less than 10
>>>> minutes it has found 10 eigenvectors. The second step (cleansvd) has been
>>>> running for 15 hours and is only 10% of the way through the map tasks.
>>>> 
>>> 10 minutes for 10 eigenvectors is totally reasonable, depending on how
>>> sparse this matrix is.  cleansvd should take roughly the *same* amount of
>>> time as the original solving, because it's basically just taking each of the
>>> vectors and re-multiplying them by the original matrix to see how close to
>>> actual eigenvectors they are.  Do you have any log output to share?  I can
>>> try to reproduce this on a similarly sized matrix, but this defintely seems
>>> wrong.
>>> 
>>> 
>>>> Is this performance expected? Is there something I can do to speed it up?
>>>> 
>>> No, and probably not.  Other than help me find the bug that's causing this.
>>>  I've never seen that behavior before.
>>> 
>>>   -jake
>>> 
>> 

--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g






Re: LanczosSolver Very Slow

Posted by Paul Mahon <pm...@decarta.com>.
I did dump the stack a few times. Here are a couple which came up most 
often:
"main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
    java.lang.Thread.State: RUNNABLE
     at 
org.apache.mahout.math.map.OpenIntDoubleHashMap.indexOfKey(OpenIntDoubleHashMap.java:285)
     at 
org.apache.mahout.math.map.OpenIntDoubleHashMap.get(OpenIntDoubleHashMap.java:202)
     at 
org.apache.mahout.math.RandomAccessSparseVector.getQuick(RandomAccessSparseVector.java:129)
     at 
org.apache.mahout.math.AbstractVector.assign(AbstractVector.java:527)
     at 
org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:236)
     at 
org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
     at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
     at org.apache.hadoop.mapred.Child.main(Child.java:170)

"main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
    java.lang.Thread.State: RUNNABLE
     at 
org.apache.mahout.math.AbstractVector.assign(AbstractVector.java:526)
     at 
org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:236)
     at 
org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
     at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
     at org.apache.hadoop.mapred.Child.main(Child.java:170)

"main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
    java.lang.Thread.State: RUNNABLE
     at 
org.apache.mahout.math.RandomAccessSparseVector.dot(RandomAccessSparseVector.java:192)
     at 
org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.scale(TimesSquaredJob.java:241)
     at 
org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:232)
     at 
org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
     at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
     at org.apache.hadoop.mapred.Child.main(Child.java:170)



On 04/29/2011 10:42 AM, Paul Mahon wrote:
> The task logs don't have anything useable:
> *_syslog logs_*
>
> 2011-04-28 15:29:17,922 INFO 
> org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics 
> with processName=MAP, sessionId=
> 2011-04-28 15:29:18,114 INFO org.apache.hadoop.mapred.MapTask: 
> numReduceTasks: 50
> 2011-04-28 15:29:18,124 INFO org.apache.hadoop.mapred.MapTask: 
> io.sort.mb = 100
> 2011-04-28 15:29:18,266 INFO org.apache.hadoop.mapred.MapTask: data 
> buffer = 79691776/99614720
> 2011-04-28 15:29:18,266 INFO org.apache.hadoop.mapred.MapTask: 
> record buffer = 262144/327680
>
> The driver log isn't very interesting either:
> 11/04/28 14:26:45 INFO mapred.JobClient: Job complete: 
> job_201104221106_0128
> 11/04/28 14:26:45 INFO mapred.JobClient: Counters: 19
> 11/04/28 14:26:45 INFO mapred.JobClient:   Job Counters
> 11/04/28 14:26:45 INFO mapred.JobClient:     Launched reduce tasks=50
> 11/04/28 14:26:45 INFO mapred.JobClient:     Rack-local map tasks=8
> 11/04/28 14:26:45 INFO mapred.JobClient:     Launched map tasks=10
> 11/04/28 14:26:45 INFO mapred.JobClient:     Data-local map tasks=2
> 11/04/28 14:26:45 INFO mapred.JobClient:   FileSystemCounters
> 11/04/28 14:26:45 INFO mapred.JobClient:     FILE_BYTES_READ=201387210
> 11/04/28 14:26:45 INFO mapred.JobClient:     HDFS_BYTES_READ=855491954
> 11/04/28 14:26:45 INFO mapred.JobClient:     
> FILE_BYTES_WRITTEN=402789200
> 11/04/28 14:26:45 INFO mapred.JobClient:     
> HDFS_BYTES_WRITTEN=20143593
> 11/04/28 14:26:45 INFO mapred.JobClient:   Map-Reduce Framework
> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce input groups=1
> 11/04/28 14:26:45 INFO mapred.JobClient:     Combine output records=10
> 11/04/28 14:26:45 INFO mapred.JobClient:     Map input records=12259359
> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce shuffle 
> bytes=181251087
> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce output records=1
> 11/04/28 14:26:45 INFO mapred.JobClient:     Spilled Records=20
> 11/04/28 14:26:45 INFO mapred.JobClient:     Map output bytes=201386850
> 11/04/28 14:26:45 INFO mapred.JobClient:     Map input bytes=654091926
> 11/04/28 14:26:45 INFO mapred.JobClient:     Combine input records=10
> 11/04/28 14:26:45 INFO mapred.JobClient:     Map output records=10
> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce input records=10
> 11/04/28 14:26:45 INFO lanczos.LanczosSolver: 9 passes through the 
> corpus so far...
> 11/04/28 14:26:53 INFO lanczos.LanczosSolver: Lanczos iteration 
> complete - now to diagonalize the tri-diagonal auxiliary matrix.
> 11/04/28 14:26:58 INFO lanczos.LanczosSolver: Eigenvector 0 found 
> with eigenvalue 0.0
> 11/04/28 14:27:02 INFO lanczos.LanczosSolver: Eigenvector 1 found 
> with eigenvalue 2.7694996270467187
> 11/04/28 14:27:06 INFO lanczos.LanczosSolver: Eigenvector 2 found 
> with eigenvalue 124.51350651732557
> 11/04/28 14:27:10 INFO lanczos.LanczosSolver: Eigenvector 3 found 
> with eigenvalue 254.94694717051294
> 11/04/28 14:27:15 INFO lanczos.LanczosSolver: Eigenvector 4 found 
> with eigenvalue 387.54002942185565
> 11/04/28 14:27:19 INFO lanczos.LanczosSolver: Eigenvector 5 found 
> with eigenvalue 501.70725582316834
> 11/04/28 14:27:23 INFO lanczos.LanczosSolver: Eigenvector 6 found 
> with eigenvalue 614.8114660750927
> 11/04/28 14:27:28 INFO lanczos.LanczosSolver: Eigenvector 7 found 
> with eigenvalue 700.1694351007611
> 11/04/28 14:27:32 INFO lanczos.LanczosSolver: Eigenvector 8 found 
> with eigenvalue 775.2520850269659
> 11/04/28 14:27:36 INFO lanczos.LanczosSolver: Eigenvector 9 found 
> with eigenvalue 39441.64442962535
> 11/04/28 14:27:36 INFO lanczos.LanczosSolver: LanczosSolver finished.
> 11/04/28 14:27:36 INFO decomposer.DistributedLanczosSolver: 
> Persisting 10 eigenVectors and eigenValues to: ./svd/rawEigenvectors
> 11/04/28 14:29:11 INFO mapred.FileInputFormat: Total input paths to 
> process : 1
> 11/04/28 14:29:12 INFO mapred.JobClient: Running job: 
> job_201104221106_0129
> 11/04/28 14:29:13 INFO mapred.JobClient:  map 0% reduce 0%
> 11/04/28 16:08:23 INFO mapred.JobClient:  map 1% reduce 0%
> ...
> 11/04/29 05:32:24 INFO mapred.JobClient:  map 9% reduce 0%
> 11/04/29 07:18:38 INFO mapred.JobClient:  map 10% reduce 0%
>
>
> The counters don't mean much to me either. Here's one set from an 
> incomplete mapper:
> *FileSystemCounters*
>
>     HDFS_BYTES_READ     24,540,497
>
> *Map-Reduce Framework*
>
>     Combine output records     0
>
>     Map input records     115,815
>
>     Spilled Records     0
>
>     Map output bytes     0
>
>     Map input bytes     4,401,563
>
>     Combine input records     0
>
>     Map output records     0
>
>
>
> On 04/29/2011 10:28 AM, Jake Mannix wrote:
>> On Fri, Apr 29, 2011 at 9:45 AM, Paul Mahon<pm...@decarta.com>  
>> wrote:
>>
>>> I'm running the LanczosSolver (from SVN two days ago) it it's 
>>> taking (what
>>> I think) is a very long time to run. The matrix I'm using is very 
>>> sparse.
>>> The command I used was:
>>>
>>> bin/mahour svd --input ./base --output ./svd --numRows 12259359 
>>> --numCols
>>> 2517335 --rank 10 --cleansvd true -Dmapred.child.java.opts=-Xmx2048m
>>>
>>> The first part of the program runs pretty quickly, and in less 
>>> than 10
>>> minutes it has found 10 eigenvectors. The second step (cleansvd) 
>>> has been
>>> running for 15 hours and is only 10% of the way through the map 
>>> tasks.
>>>
>> 10 minutes for 10 eigenvectors is totally reasonable, depending on how
>> sparse this matrix is.  cleansvd should take roughly the *same* 
>> amount of
>> time as the original solving, because it's basically just taking 
>> each of the
>> vectors and re-multiplying them by the original matrix to see how 
>> close to
>> actual eigenvectors they are.  Do you have any log output to 
>> share?  I can
>> try to reproduce this on a similarly sized matrix, but this 
>> defintely seems
>> wrong.
>>
>>
>>> Is this performance expected? Is there something I can do to speed 
>>> it up?
>>>
>> No, and probably not.  Other than help me find the bug that's 
>> causing this.
>>   I've never seen that behavior before.
>>
>>    -jake
>>
>

Re: LanczosSolver Very Slow

Posted by Paul Mahon <pm...@decarta.com>.
The task logs don't have anything useable:
*_syslog logs_*

2011-04-28 15:29:17,922 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=
2011-04-28 15:29:18,114 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 50
2011-04-28 15:29:18,124 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 100
2011-04-28 15:29:18,266 INFO org.apache.hadoop.mapred.MapTask: data buffer = 79691776/99614720
2011-04-28 15:29:18,266 INFO org.apache.hadoop.mapred.MapTask: record buffer = 262144/327680

The driver log isn't very interesting either:
11/04/28 14:26:45 INFO mapred.JobClient: Job complete: 
job_201104221106_0128
11/04/28 14:26:45 INFO mapred.JobClient: Counters: 19
11/04/28 14:26:45 INFO mapred.JobClient:   Job Counters
11/04/28 14:26:45 INFO mapred.JobClient:     Launched reduce tasks=50
11/04/28 14:26:45 INFO mapred.JobClient:     Rack-local map tasks=8
11/04/28 14:26:45 INFO mapred.JobClient:     Launched map tasks=10
11/04/28 14:26:45 INFO mapred.JobClient:     Data-local map tasks=2
11/04/28 14:26:45 INFO mapred.JobClient:   FileSystemCounters
11/04/28 14:26:45 INFO mapred.JobClient:     FILE_BYTES_READ=201387210
11/04/28 14:26:45 INFO mapred.JobClient:     HDFS_BYTES_READ=855491954
11/04/28 14:26:45 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=402789200
11/04/28 14:26:45 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=20143593
11/04/28 14:26:45 INFO mapred.JobClient:   Map-Reduce Framework
11/04/28 14:26:45 INFO mapred.JobClient:     Reduce input groups=1
11/04/28 14:26:45 INFO mapred.JobClient:     Combine output records=10
11/04/28 14:26:45 INFO mapred.JobClient:     Map input records=12259359
11/04/28 14:26:45 INFO mapred.JobClient:     Reduce shuffle 
bytes=181251087
11/04/28 14:26:45 INFO mapred.JobClient:     Reduce output records=1
11/04/28 14:26:45 INFO mapred.JobClient:     Spilled Records=20
11/04/28 14:26:45 INFO mapred.JobClient:     Map output bytes=201386850
11/04/28 14:26:45 INFO mapred.JobClient:     Map input bytes=654091926
11/04/28 14:26:45 INFO mapred.JobClient:     Combine input records=10
11/04/28 14:26:45 INFO mapred.JobClient:     Map output records=10
11/04/28 14:26:45 INFO mapred.JobClient:     Reduce input records=10
11/04/28 14:26:45 INFO lanczos.LanczosSolver: 9 passes through the 
corpus so far...
11/04/28 14:26:53 INFO lanczos.LanczosSolver: Lanczos iteration 
complete - now to diagonalize the tri-diagonal auxiliary matrix.
11/04/28 14:26:58 INFO lanczos.LanczosSolver: Eigenvector 0 found with 
eigenvalue 0.0
11/04/28 14:27:02 INFO lanczos.LanczosSolver: Eigenvector 1 found with 
eigenvalue 2.7694996270467187
11/04/28 14:27:06 INFO lanczos.LanczosSolver: Eigenvector 2 found with 
eigenvalue 124.51350651732557
11/04/28 14:27:10 INFO lanczos.LanczosSolver: Eigenvector 3 found with 
eigenvalue 254.94694717051294
11/04/28 14:27:15 INFO lanczos.LanczosSolver: Eigenvector 4 found with 
eigenvalue 387.54002942185565
11/04/28 14:27:19 INFO lanczos.LanczosSolver: Eigenvector 5 found with 
eigenvalue 501.70725582316834
11/04/28 14:27:23 INFO lanczos.LanczosSolver: Eigenvector 6 found with 
eigenvalue 614.8114660750927
11/04/28 14:27:28 INFO lanczos.LanczosSolver: Eigenvector 7 found with 
eigenvalue 700.1694351007611
11/04/28 14:27:32 INFO lanczos.LanczosSolver: Eigenvector 8 found with 
eigenvalue 775.2520850269659
11/04/28 14:27:36 INFO lanczos.LanczosSolver: Eigenvector 9 found with 
eigenvalue 39441.64442962535
11/04/28 14:27:36 INFO lanczos.LanczosSolver: LanczosSolver finished.
11/04/28 14:27:36 INFO decomposer.DistributedLanczosSolver: Persisting 
10 eigenVectors and eigenValues to: ./svd/rawEigenvectors
11/04/28 14:29:11 INFO mapred.FileInputFormat: Total input paths to 
process : 1
11/04/28 14:29:12 INFO mapred.JobClient: Running job: 
job_201104221106_0129
11/04/28 14:29:13 INFO mapred.JobClient:  map 0% reduce 0%
11/04/28 16:08:23 INFO mapred.JobClient:  map 1% reduce 0%
...
11/04/29 05:32:24 INFO mapred.JobClient:  map 9% reduce 0%
11/04/29 07:18:38 INFO mapred.JobClient:  map 10% reduce 0%


The counters don't mean much to me either. Here's one set from an 
incomplete mapper:
*FileSystemCounters*

	HDFS_BYTES_READ 	24,540,497

*Map-Reduce Framework*

	Combine output records 	0

	Map input records 	115,815

	Spilled Records 	0

	Map output bytes 	0

	Map input bytes 	4,401,563

	Combine input records 	0

	Map output records 	0



On 04/29/2011 10:28 AM, Jake Mannix wrote:
> On Fri, Apr 29, 2011 at 9:45 AM, Paul Mahon<pm...@decarta.com>  wrote:
>
>> I'm running the LanczosSolver (from SVN two days ago) it it's taking (what
>> I think) is a very long time to run. The matrix I'm using is very sparse.
>> The command I used was:
>>
>> bin/mahour svd --input ./base --output ./svd --numRows 12259359 --numCols
>> 2517335 --rank 10 --cleansvd true -Dmapred.child.java.opts=-Xmx2048m
>>
>> The first part of the program runs pretty quickly, and in less than 10
>> minutes it has found 10 eigenvectors. The second step (cleansvd) has been
>> running for 15 hours and is only 10% of the way through the map tasks.
>>
> 10 minutes for 10 eigenvectors is totally reasonable, depending on how
> sparse this matrix is.  cleansvd should take roughly the *same* amount of
> time as the original solving, because it's basically just taking each of the
> vectors and re-multiplying them by the original matrix to see how close to
> actual eigenvectors they are.  Do you have any log output to share?  I can
> try to reproduce this on a similarly sized matrix, but this defintely seems
> wrong.
>
>
>> Is this performance expected? Is there something I can do to speed it up?
>>
> No, and probably not.  Other than help me find the bug that's causing this.
>   I've never seen that behavior before.
>
>    -jake
>

Re: LanczosSolver Very Slow

Posted by Jake Mannix <ja...@gmail.com>.
On Fri, Apr 29, 2011 at 9:45 AM, Paul Mahon <pm...@decarta.com> wrote:

> I'm running the LanczosSolver (from SVN two days ago) it it's taking (what
> I think) is a very long time to run. The matrix I'm using is very sparse.
> The command I used was:
>
> bin/mahour svd --input ./base --output ./svd --numRows 12259359 --numCols
> 2517335 --rank 10 --cleansvd true -Dmapred.child.java.opts=-Xmx2048m
>
> The first part of the program runs pretty quickly, and in less than 10
> minutes it has found 10 eigenvectors. The second step (cleansvd) has been
> running for 15 hours and is only 10% of the way through the map tasks.
>

10 minutes for 10 eigenvectors is totally reasonable, depending on how
sparse this matrix is.  cleansvd should take roughly the *same* amount of
time as the original solving, because it's basically just taking each of the
vectors and re-multiplying them by the original matrix to see how close to
actual eigenvectors they are.  Do you have any log output to share?  I can
try to reproduce this on a similarly sized matrix, but this defintely seems
wrong.


> Is this performance expected? Is there something I can do to speed it up?
>

No, and probably not.  Other than help me find the bug that's causing this.
 I've never seen that behavior before.

  -jake