You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Danny Bickson <da...@gmail.com> on 2011/03/05 01:33:16 UTC

Re: Fwd: Another mahout ALS question

Hi Sebastian,
As promised,  you can find some results for testing your ALS code, on 64
high performance Amazon EC2 machines (with up to 1,024 cores).
http://bickson.blogspot.com/2011/03/tunning-hadoop-configuration-for-high.html

I would love to get any feedback you or others may have about the setup of
this experiment.

Best,

Danny Bickson

On Wed, Feb 23, 2011 at 4:41 PM, Sebastian Schelter <ss...@apache.org> wrote:

> Hi Danny,
>
> please send all mails to user@mahout.apache.org instead of directly
> sending them to me, there are a lot of smart people on that list that might
> join with advice.
>
> I'm very excited that you intensively test this code and I'm positively
> suprised to see it give good results. Thank you for the effort you put into
> that!
>
> The exception seems to occur when ALSEvaluator is run. The code uses a
> quick and dirty approach to compute the error of the model as it just loads
> the user and item feature matrices completely into memory. With an
> increasing number of features memory consumption is getting too large.
>
> The code of that evaluator step needs to be changed, so that each
> (user,item) pair for which the rating shall be predicted gets joined with
> the according user and item feature vectors in a way that they are mapped to
> the same key and go to the same reducer which can then compute the error.
>
> I already started implementing something like this, but I don't have a lot
> of time these days unfortunately. I could update the patch during the next
> week if that's ok for you.
>
> --sebastian
>
>
>
>
> On 23.02.2011 21:57, Danny Bickson wrote:
>
>> Another exception I am getting:
>>
>> 11/02/23 20:45:34 INFO common.AbstractJob: Command line arguments:
>> {--endPhase=2147483647, --itemFeatures=/tmp/als/out/M/
>> , --probes=/user/ubuntu/myout/probeSet/, --startPhase=0, --tempDir=temp,
>> --userFeatures=/tmp/als/out/U/}
>> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>>        at
>>
>> org.apache.mahout.math.map.OpenIntDoubleHashMap.rehash(OpenIntDoubleHashMap.java:433)
>>        at
>>
>> org.apache.mahout.math.map.OpenIntDoubleHashMap.put(OpenIntDoubleHashMap.java:387)
>>        at
>>
>> org.apache.mahout.math.RandomAccessSparseVector.setQuick(RandomAccessSparseVector.java:134)
>>        at
>> org.apache.mahout.math.VectorWritable.readFields(VectorWritable.java:113)
>>        at
>>
>> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1751)
>>        at
>> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1879)
>>        at
>>
>> org.apache.mahout.utils.eval.ALSEvaluator.readMatrix(ALSEvaluator.java:113)
>>        at
>> org.apache.mahout.utils.eval.ALSEvaluator.run(ALSEvaluator.java:71)
>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>        at
>> org.apache.mahout.utils.eval.ALSEvaluator.main(ALSEvaluator.java:52)
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at
>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>        at
>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>        at java.lang.reflect.Method.invoke(Method.java:616)
>>        at
>>
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>        at
>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>        at
>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:174)
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at
>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>        at
>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>        at java.lang.reflect.Method.invoke(Method.java:616)
>>        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>
>> THANKS!
>> ---------- Forwarded message ----------
>> From: *Danny Bickson* <danny.bickson@gmail.com
>> <ma...@gmail.com>>
>> Date: Wed, Feb 23, 2011 at 3:05 PM
>> Subject: Another mahout ALS question
>> To: ssc@apache.org <ma...@apache.org>
>>
>>
>> Hi!
>> I successfully run 10 iterations for your ALS code, with D=20,
>> lambda=0.065 and I get a very impressive RMSE of 0.93
>> However, when I try to increase D, I get various out of memory errors,
>> even with small netflix subsample of 3M values.
>>
>> One of the errors I am getting is in the evaluateALS step:
>> 11/02/23 19:04:11 WARN driver.MahoutDriver: No evaluateALS.props found
>> on classpath, will use command-line arguments only
>> 11/02/23 19:04:12 INFO common.AbstractJob: Command line arguments:
>> {--endPhase=2147483647, --itemFeatures=/tmp/als/out/M/,
>> --probes=/user/ubuntu/myout/probeSet/, --startPhase=0, --tempDir=temp,
>> --userFeatures=/tmp/als/out/U/}
>> Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit
>> exceeded
>>         at
>>
>> org.apache.mahout.math.map.OpenIntDoubleHashMap.rehash(OpenIntDoubleHashMap.java:433)
>>         at
>>
>> org.apache.mahout.math.map.OpenIntDoubleHashMap.put(OpenIntDoubleHashMap.java:387)
>>         at
>>
>> org.apache.mahout.math.RandomAccessSparseVector.setQuick(RandomAccessSparseVector.java:134)
>>         at
>> org.apache.mahout.math.VectorWritable.readFields(VectorWritable.java:113)
>>         at
>>
>> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1751)
>>         at
>> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1879)
>>         at
>>
>> org.apache.mahout.utils.eval.ALSEvaluator.readMatrix(ALSEvaluator.java:113)
>>         at
>> org.apache.mahout.utils.eval.ALSEvaluator.run(ALSEvaluator.java:71)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>         at
>> org.apache.mahout.utils.eval.ALSEvaluator.main(ALSEvaluator.java:52)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>         at
>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>         at java.lang.reflect.Method.invoke(Method.java:616)
>>         at
>>
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>         at
>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>         at
>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:174)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>         at
>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>         at java.lang.reflect.Method.invoke(Method.java:616)
>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>
>>
>> There is no related exception in the Hadoop logs.
>>
>> I am running with java child opts of -Xmx2048M.
>>
>> Do you have any tips for me? Do you want me to post this into the
>> Mahout-542 newsgroup?
>>
>> thanks,
>>
>>
>> DB
>>
>>
>

Re: Fwd: Another mahout ALS question

Posted by Lance Norskog <go...@gmail.com>.
Thanks, this is really helpful.

On Fri, Mar 4, 2011 at 5:15 PM, Danny Bickson <da...@gmail.com> wrote:
> I perfectly agree. And I just fixed the graph to plot the full range of
> values.
> Thanks for the confirmation about experiment setup. If anyone has any tips I
> would love to hear them further.
>
> - Danny
>
> On Fri, Mar 4, 2011 at 8:02 PM, Ted Dunning <te...@gmail.com> wrote:
>
>> I generically dislike graphs with offset references.
>>
>> Your run-time graph has a baseline of 1500 seconds which makes it tricky
>> for the reader to understand your (entirely correct) statement that having
>> more than 8 machines isn't helpful.  The way you plotted the graph
>> coincidentally looks like nearly perfect speedup across the entire range.
>>
>> No comments about your setup.  My guess is that you could tune hadoop to
>> get a better result due to lower overheads but the results won't be
>> categorically different.  Iterative algorithms on stock hadoop are just
>> plain problematic.
>>
>>
>> On Fri, Mar 4, 2011 at 4:33 PM, Danny Bickson <da...@gmail.com>wrote:
>>
>>> I would love to get any feedback you or others may have about the setup of
>>> this experiment.
>>>
>>
>>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Fwd: Another mahout ALS question

Posted by Danny Bickson <da...@gmail.com>.
I perfectly agree. And I just fixed the graph to plot the full range of
values.
Thanks for the confirmation about experiment setup. If anyone has any tips I
would love to hear them further.

- Danny

On Fri, Mar 4, 2011 at 8:02 PM, Ted Dunning <te...@gmail.com> wrote:

> I generically dislike graphs with offset references.
>
> Your run-time graph has a baseline of 1500 seconds which makes it tricky
> for the reader to understand your (entirely correct) statement that having
> more than 8 machines isn't helpful.  The way you plotted the graph
> coincidentally looks like nearly perfect speedup across the entire range.
>
> No comments about your setup.  My guess is that you could tune hadoop to
> get a better result due to lower overheads but the results won't be
> categorically different.  Iterative algorithms on stock hadoop are just
> plain problematic.
>
>
> On Fri, Mar 4, 2011 at 4:33 PM, Danny Bickson <da...@gmail.com>wrote:
>
>> I would love to get any feedback you or others may have about the setup of
>> this experiment.
>>
>
>

Re: Fwd: Another mahout ALS question

Posted by Ted Dunning <te...@gmail.com>.
I generically dislike graphs with offset references.

Your run-time graph has a baseline of 1500 seconds which makes it tricky for
the reader to understand your (entirely correct) statement that having more
than 8 machines isn't helpful.  The way you plotted the graph coincidentally
looks like nearly perfect speedup across the entire range.

No comments about your setup.  My guess is that you could tune hadoop to get
a better result due to lower overheads but the results won't be
categorically different.  Iterative algorithms on stock hadoop are just
plain problematic.

On Fri, Mar 4, 2011 at 4:33 PM, Danny Bickson <da...@gmail.com>wrote:

> I would love to get any feedback you or others may have about the setup of
> this experiment.
>

Re: Fwd: Another mahout ALS question

Posted by Danny Bickson <da...@gmail.com>.
Hi,
A quick update. I managed to squeeze some more performance by tuning DFS
block size, and number of mapeprs and reducers.
Now 16 HPC nodes (256 CPU) seem to give the highest performance for both
alternating least squares and CoEM (NLP algorithm).
I have posted updated some new graphs on
http://bickson.blogspot.com/2011/03/tunning-hadoop-configuration-for-high.html

Unfortunately, I am out of experiment EC2 budget, so I will not be able to
further fine tune performance.

- Danny

On Sun, Mar 6, 2011 at 4:10 PM, Danny Bickson <da...@gmail.com>wrote:

> Hi again,
> I think I found some problems in my setup and I will rerun experiments
> soon.
> When using 32,64 machines I think that not enough mappers/reducers are
> allocated.
> Regarding the patch, I still need it, I run all experiments with D=20, on
> D=30 and above
> I get memory errors.
>
> Thanks!
>
>
> On Sun, Mar 6, 2011 at 4:02 PM, Sebastian Schelter <ss...@apache.org> wrote:
>
>> Hi Danny,
>>
>> thanks for the nice writeup! I'm a little bit disappointed about the
>> performance though...
>>
>> Seems you got around those memory problems from last week without my
>> patch, which is good, since I unfortunately didn't have the time to finish
>> that one yet.
>>
>>
>>
>>
>>
>> On 05.03.2011 01:33, Danny Bickson wrote:
>>
>>> Hi Sebastian,
>>> As promised,  you can find some results for testing your ALS code, on 64
>>> high performance Amazon EC2 machines (with up to 1,024 cores).
>>>
>>> http://bickson.blogspot.com/2011/03/tunning-hadoop-configuration-for-high.html
>>>
>>> I would love to get any feedback you or others may have about the setup
>>> of this experiment.
>>>
>>> Best,
>>>
>>> Danny Bickson
>>>
>>> On Wed, Feb 23, 2011 at 4:41 PM, Sebastian Schelter <ssc@apache.org
>>> <ma...@apache.org>> wrote:
>>>
>>>    Hi Danny,
>>>
>>>    please send all mails to user@mahout.apache.org
>>>    <ma...@mahout.apache.org> instead of directly sending them to
>>>
>>>    me, there are a lot of smart people on that list that might join
>>>    with advice.
>>>
>>>    I'm very excited that you intensively test this code and I'm
>>>    positively suprised to see it give good results. Thank you for the
>>>    effort you put into that!
>>>
>>>    The exception seems to occur when ALSEvaluator is run. The code uses
>>>    a quick and dirty approach to compute the error of the model as it
>>>    just loads the user and item feature matrices completely into
>>>    memory. With an increasing number of features memory consumption is
>>>    getting too large.
>>>
>>>    The code of that evaluator step needs to be changed, so that each
>>>    (user,item) pair for which the rating shall be predicted gets joined
>>>    with the according user and item feature vectors in a way that they
>>>    are mapped to the same key and go to the same reducer which can then
>>>    compute the error.
>>>
>>>    I already started implementing something like this, but I don't have
>>>    a lot of time these days unfortunately. I could update the patch
>>>    during the next week if that's ok for you.
>>>
>>>    --sebastian
>>>
>>>
>>>
>>>
>>>    On 23.02.2011 21:57, Danny Bickson wrote:
>>>
>>>        Another exception I am getting:
>>>
>>>        11/02/23 20:45:34 INFO common.AbstractJob: Command line arguments:
>>>        {--endPhase=2147483647, --itemFeatures=/tmp/als/out/M/
>>>        , --probes=/user/ubuntu/myout/probeSet/, --startPhase=0,
>>>        --tempDir=temp,
>>>        --userFeatures=/tmp/als/out/U/}
>>>        Exception in thread "main" java.lang.OutOfMemoryError: Java heap
>>>        space
>>>                at
>>>
>>>  org.apache.mahout.math.map.OpenIntDoubleHashMap.rehash(OpenIntDoubleHashMap.java:433)
>>>                at
>>>
>>>  org.apache.mahout.math.map.OpenIntDoubleHashMap.put(OpenIntDoubleHashMap.java:387)
>>>                at
>>>
>>>  org.apache.mahout.math.RandomAccessSparseVector.setQuick(RandomAccessSparseVector.java:134)
>>>                at
>>>
>>>  org.apache.mahout.math.VectorWritable.readFields(VectorWritable.java:113)
>>>                at
>>>
>>>  org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1751)
>>>                at
>>>
>>>  org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1879)
>>>                at
>>>
>>>  org.apache.mahout.utils.eval.ALSEvaluator.readMatrix(ALSEvaluator.java:113)
>>>                at
>>>
>>>  org.apache.mahout.utils.eval.ALSEvaluator.run(ALSEvaluator.java:71)
>>>                at
>>> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>                at
>>> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>>                at
>>>
>>>  org.apache.mahout.utils.eval.ALSEvaluator.main(ALSEvaluator.java:52)
>>>                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>        Method)
>>>                at
>>>
>>>  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>                at
>>>
>>>  sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>                at java.lang.reflect.Method.invoke(Method.java:616)
>>>                at
>>>
>>>  org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>>                at
>>>
>>>  org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>>                at
>>>        org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:174)
>>>                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>        Method)
>>>                at
>>>
>>>  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>                at
>>>
>>>  sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>                at java.lang.reflect.Method.invoke(Method.java:616)
>>>                at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>>
>>>        THANKS!
>>>        ---------- Forwarded message ----------
>>>        From: *Danny Bickson* <danny.bickson@gmail.com
>>>        <ma...@gmail.com>
>>>        <mailto:danny.bickson@gmail.com <mailto:danny.bickson@gmail.com
>>> >>>
>>>        Date: Wed, Feb 23, 2011 at 3:05 PM
>>>        Subject: Another mahout ALS question
>>>        To: ssc@apache.org <ma...@apache.org>
>>>        <mailto:ssc@apache.org <ma...@apache.org>>
>>>
>>>
>>>        Hi!
>>>        I successfully run 10 iterations for your ALS code, with D=20,
>>>        lambda=0.065 and I get a very impressive RMSE of 0.93
>>>        However, when I try to increase D, I get various out of memory
>>>        errors,
>>>        even with small netflix subsample of 3M values.
>>>
>>>        One of the errors I am getting is in the evaluateALS step:
>>>        11/02/23 19:04:11 WARN driver.MahoutDriver: No evaluateALS.props
>>>        found
>>>        on classpath, will use command-line arguments only
>>>        11/02/23 19:04:12 INFO common.AbstractJob: Command line arguments:
>>>        {--endPhase=2147483647, --itemFeatures=/tmp/als/out/M/,
>>>        --probes=/user/ubuntu/myout/probeSet/, --startPhase=0,
>>>        --tempDir=temp,
>>>        --userFeatures=/tmp/als/out/U/}
>>>        Exception in thread "main" java.lang.OutOfMemoryError: GC
>>>        overhead limit
>>>        exceeded
>>>                 at
>>>
>>>  org.apache.mahout.math.map.OpenIntDoubleHashMap.rehash(OpenIntDoubleHashMap.java:433)
>>>                 at
>>>
>>>  org.apache.mahout.math.map.OpenIntDoubleHashMap.put(OpenIntDoubleHashMap.java:387)
>>>                 at
>>>
>>>  org.apache.mahout.math.RandomAccessSparseVector.setQuick(RandomAccessSparseVector.java:134)
>>>                 at
>>>
>>>  org.apache.mahout.math.VectorWritable.readFields(VectorWritable.java:113)
>>>                 at
>>>
>>>  org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1751)
>>>                 at
>>>
>>>  org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1879)
>>>                 at
>>>
>>>  org.apache.mahout.utils.eval.ALSEvaluator.readMatrix(ALSEvaluator.java:113)
>>>                 at
>>>
>>>  org.apache.mahout.utils.eval.ALSEvaluator.run(ALSEvaluator.java:71)
>>>                 at
>>>        org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>                 at
>>>        org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>>                 at
>>>
>>>  org.apache.mahout.utils.eval.ALSEvaluator.main(ALSEvaluator.java:52)
>>>                 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>        Method)
>>>                 at
>>>
>>>  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>                 at
>>>
>>>  sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>                 at java.lang.reflect.Method.invoke(Method.java:616)
>>>                 at
>>>
>>>  org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>>                 at
>>>
>>>  org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>>                 at
>>>        org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:174)
>>>                 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>        Method)
>>>                 at
>>>
>>>  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>                 at
>>>
>>>  sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>                 at java.lang.reflect.Method.invoke(Method.java:616)
>>>                 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>>
>>>
>>>        There is no related exception in the Hadoop logs.
>>>
>>>        I am running with java child opts of -Xmx2048M.
>>>
>>>        Do you have any tips for me? Do you want me to post this into the
>>>        Mahout-542 newsgroup?
>>>
>>>        thanks,
>>>
>>>
>>>        DB
>>>
>>>
>>>
>>>
>>
>

Re: Fwd: Another mahout ALS question

Posted by Danny Bickson <da...@gmail.com>.
Hi again,
I think I found some problems in my setup and I will rerun experiments soon.
When using 32,64 machines I think that not enough mappers/reducers are
allocated.
Regarding the patch, I still need it, I run all experiments with D=20, on
D=30 and above
I get memory errors.

Thanks!

On Sun, Mar 6, 2011 at 4:02 PM, Sebastian Schelter <ss...@apache.org> wrote:

> Hi Danny,
>
> thanks for the nice writeup! I'm a little bit disappointed about the
> performance though...
>
> Seems you got around those memory problems from last week without my patch,
> which is good, since I unfortunately didn't have the time to finish that one
> yet.
>
>
>
>
>
> On 05.03.2011 01:33, Danny Bickson wrote:
>
>> Hi Sebastian,
>> As promised,  you can find some results for testing your ALS code, on 64
>> high performance Amazon EC2 machines (with up to 1,024 cores).
>>
>> http://bickson.blogspot.com/2011/03/tunning-hadoop-configuration-for-high.html
>>
>> I would love to get any feedback you or others may have about the setup
>> of this experiment.
>>
>> Best,
>>
>> Danny Bickson
>>
>> On Wed, Feb 23, 2011 at 4:41 PM, Sebastian Schelter <ssc@apache.org
>> <ma...@apache.org>> wrote:
>>
>>    Hi Danny,
>>
>>    please send all mails to user@mahout.apache.org
>>    <ma...@mahout.apache.org> instead of directly sending them to
>>
>>    me, there are a lot of smart people on that list that might join
>>    with advice.
>>
>>    I'm very excited that you intensively test this code and I'm
>>    positively suprised to see it give good results. Thank you for the
>>    effort you put into that!
>>
>>    The exception seems to occur when ALSEvaluator is run. The code uses
>>    a quick and dirty approach to compute the error of the model as it
>>    just loads the user and item feature matrices completely into
>>    memory. With an increasing number of features memory consumption is
>>    getting too large.
>>
>>    The code of that evaluator step needs to be changed, so that each
>>    (user,item) pair for which the rating shall be predicted gets joined
>>    with the according user and item feature vectors in a way that they
>>    are mapped to the same key and go to the same reducer which can then
>>    compute the error.
>>
>>    I already started implementing something like this, but I don't have
>>    a lot of time these days unfortunately. I could update the patch
>>    during the next week if that's ok for you.
>>
>>    --sebastian
>>
>>
>>
>>
>>    On 23.02.2011 21:57, Danny Bickson wrote:
>>
>>        Another exception I am getting:
>>
>>        11/02/23 20:45:34 INFO common.AbstractJob: Command line arguments:
>>        {--endPhase=2147483647, --itemFeatures=/tmp/als/out/M/
>>        , --probes=/user/ubuntu/myout/probeSet/, --startPhase=0,
>>        --tempDir=temp,
>>        --userFeatures=/tmp/als/out/U/}
>>        Exception in thread "main" java.lang.OutOfMemoryError: Java heap
>>        space
>>                at
>>
>>  org.apache.mahout.math.map.OpenIntDoubleHashMap.rehash(OpenIntDoubleHashMap.java:433)
>>                at
>>
>>  org.apache.mahout.math.map.OpenIntDoubleHashMap.put(OpenIntDoubleHashMap.java:387)
>>                at
>>
>>  org.apache.mahout.math.RandomAccessSparseVector.setQuick(RandomAccessSparseVector.java:134)
>>                at
>>
>>  org.apache.mahout.math.VectorWritable.readFields(VectorWritable.java:113)
>>                at
>>
>>  org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1751)
>>                at
>>
>>  org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1879)
>>                at
>>
>>  org.apache.mahout.utils.eval.ALSEvaluator.readMatrix(ALSEvaluator.java:113)
>>                at
>>        org.apache.mahout.utils.eval.ALSEvaluator.run(ALSEvaluator.java:71)
>>                at
>> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>                at
>> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>                at
>>
>>  org.apache.mahout.utils.eval.ALSEvaluator.main(ALSEvaluator.java:52)
>>                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>        Method)
>>                at
>>
>>  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>                at
>>
>>  sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>                at java.lang.reflect.Method.invoke(Method.java:616)
>>                at
>>
>>  org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>                at
>>        org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>                at
>>        org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:174)
>>                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>        Method)
>>                at
>>
>>  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>                at
>>
>>  sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>                at java.lang.reflect.Method.invoke(Method.java:616)
>>                at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>
>>        THANKS!
>>        ---------- Forwarded message ----------
>>        From: *Danny Bickson* <danny.bickson@gmail.com
>>        <ma...@gmail.com>
>>        <mailto:danny.bickson@gmail.com <ma...@gmail.com>>>
>>        Date: Wed, Feb 23, 2011 at 3:05 PM
>>        Subject: Another mahout ALS question
>>        To: ssc@apache.org <ma...@apache.org>
>>        <mailto:ssc@apache.org <ma...@apache.org>>
>>
>>
>>        Hi!
>>        I successfully run 10 iterations for your ALS code, with D=20,
>>        lambda=0.065 and I get a very impressive RMSE of 0.93
>>        However, when I try to increase D, I get various out of memory
>>        errors,
>>        even with small netflix subsample of 3M values.
>>
>>        One of the errors I am getting is in the evaluateALS step:
>>        11/02/23 19:04:11 WARN driver.MahoutDriver: No evaluateALS.props
>>        found
>>        on classpath, will use command-line arguments only
>>        11/02/23 19:04:12 INFO common.AbstractJob: Command line arguments:
>>        {--endPhase=2147483647, --itemFeatures=/tmp/als/out/M/,
>>        --probes=/user/ubuntu/myout/probeSet/, --startPhase=0,
>>        --tempDir=temp,
>>        --userFeatures=/tmp/als/out/U/}
>>        Exception in thread "main" java.lang.OutOfMemoryError: GC
>>        overhead limit
>>        exceeded
>>                 at
>>
>>  org.apache.mahout.math.map.OpenIntDoubleHashMap.rehash(OpenIntDoubleHashMap.java:433)
>>                 at
>>
>>  org.apache.mahout.math.map.OpenIntDoubleHashMap.put(OpenIntDoubleHashMap.java:387)
>>                 at
>>
>>  org.apache.mahout.math.RandomAccessSparseVector.setQuick(RandomAccessSparseVector.java:134)
>>                 at
>>
>>  org.apache.mahout.math.VectorWritable.readFields(VectorWritable.java:113)
>>                 at
>>
>>  org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1751)
>>                 at
>>
>>  org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1879)
>>                 at
>>
>>  org.apache.mahout.utils.eval.ALSEvaluator.readMatrix(ALSEvaluator.java:113)
>>                 at
>>        org.apache.mahout.utils.eval.ALSEvaluator.run(ALSEvaluator.java:71)
>>                 at
>>        org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>                 at
>>        org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>                 at
>>
>>  org.apache.mahout.utils.eval.ALSEvaluator.main(ALSEvaluator.java:52)
>>                 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>        Method)
>>                 at
>>
>>  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>                 at
>>
>>  sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>                 at java.lang.reflect.Method.invoke(Method.java:616)
>>                 at
>>
>>  org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>                 at
>>        org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>                 at
>>        org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:174)
>>                 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>        Method)
>>                 at
>>
>>  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>                 at
>>
>>  sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>                 at java.lang.reflect.Method.invoke(Method.java:616)
>>                 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>
>>
>>        There is no related exception in the Hadoop logs.
>>
>>        I am running with java child opts of -Xmx2048M.
>>
>>        Do you have any tips for me? Do you want me to post this into the
>>        Mahout-542 newsgroup?
>>
>>        thanks,
>>
>>
>>        DB
>>
>>
>>
>>
>

Re: Fwd: Another mahout ALS question

Posted by Sebastian Schelter <ss...@apache.org>.
Hi Danny,

thanks for the nice writeup! I'm a little bit disappointed about the 
performance though...

Seems you got around those memory problems from last week without my 
patch, which is good, since I unfortunately didn't have the time to 
finish that one yet.




On 05.03.2011 01:33, Danny Bickson wrote:
> Hi Sebastian,
> As promised,  you can find some results for testing your ALS code, on 64
> high performance Amazon EC2 machines (with up to 1,024 cores).
> http://bickson.blogspot.com/2011/03/tunning-hadoop-configuration-for-high.html
>
> I would love to get any feedback you or others may have about the setup
> of this experiment.
>
> Best,
>
> Danny Bickson
>
> On Wed, Feb 23, 2011 at 4:41 PM, Sebastian Schelter <ssc@apache.org
> <ma...@apache.org>> wrote:
>
>     Hi Danny,
>
>     please send all mails to user@mahout.apache.org
>     <ma...@mahout.apache.org> instead of directly sending them to
>     me, there are a lot of smart people on that list that might join
>     with advice.
>
>     I'm very excited that you intensively test this code and I'm
>     positively suprised to see it give good results. Thank you for the
>     effort you put into that!
>
>     The exception seems to occur when ALSEvaluator is run. The code uses
>     a quick and dirty approach to compute the error of the model as it
>     just loads the user and item feature matrices completely into
>     memory. With an increasing number of features memory consumption is
>     getting too large.
>
>     The code of that evaluator step needs to be changed, so that each
>     (user,item) pair for which the rating shall be predicted gets joined
>     with the according user and item feature vectors in a way that they
>     are mapped to the same key and go to the same reducer which can then
>     compute the error.
>
>     I already started implementing something like this, but I don't have
>     a lot of time these days unfortunately. I could update the patch
>     during the next week if that's ok for you.
>
>     --sebastian
>
>
>
>
>     On 23.02.2011 21:57, Danny Bickson wrote:
>
>         Another exception I am getting:
>
>         11/02/23 20:45:34 INFO common.AbstractJob: Command line arguments:
>         {--endPhase=2147483647, --itemFeatures=/tmp/als/out/M/
>         , --probes=/user/ubuntu/myout/probeSet/, --startPhase=0,
>         --tempDir=temp,
>         --userFeatures=/tmp/als/out/U/}
>         Exception in thread "main" java.lang.OutOfMemoryError: Java heap
>         space
>                 at
>         org.apache.mahout.math.map.OpenIntDoubleHashMap.rehash(OpenIntDoubleHashMap.java:433)
>                 at
>         org.apache.mahout.math.map.OpenIntDoubleHashMap.put(OpenIntDoubleHashMap.java:387)
>                 at
>         org.apache.mahout.math.RandomAccessSparseVector.setQuick(RandomAccessSparseVector.java:134)
>                 at
>         org.apache.mahout.math.VectorWritable.readFields(VectorWritable.java:113)
>                 at
>         org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1751)
>                 at
>         org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1879)
>                 at
>         org.apache.mahout.utils.eval.ALSEvaluator.readMatrix(ALSEvaluator.java:113)
>                 at
>         org.apache.mahout.utils.eval.ALSEvaluator.run(ALSEvaluator.java:71)
>                 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>                 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>                 at
>         org.apache.mahout.utils.eval.ALSEvaluator.main(ALSEvaluator.java:52)
>                 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>         Method)
>                 at
>         sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>                 at
>         sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>                 at java.lang.reflect.Method.invoke(Method.java:616)
>                 at
>         org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>                 at
>         org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>                 at
>         org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:174)
>                 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>         Method)
>                 at
>         sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>                 at
>         sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>                 at java.lang.reflect.Method.invoke(Method.java:616)
>                 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
>         THANKS!
>         ---------- Forwarded message ----------
>         From: *Danny Bickson* <danny.bickson@gmail.com
>         <ma...@gmail.com>
>         <mailto:danny.bickson@gmail.com <ma...@gmail.com>>>
>         Date: Wed, Feb 23, 2011 at 3:05 PM
>         Subject: Another mahout ALS question
>         To: ssc@apache.org <ma...@apache.org>
>         <mailto:ssc@apache.org <ma...@apache.org>>
>
>
>         Hi!
>         I successfully run 10 iterations for your ALS code, with D=20,
>         lambda=0.065 and I get a very impressive RMSE of 0.93
>         However, when I try to increase D, I get various out of memory
>         errors,
>         even with small netflix subsample of 3M values.
>
>         One of the errors I am getting is in the evaluateALS step:
>         11/02/23 19:04:11 WARN driver.MahoutDriver: No evaluateALS.props
>         found
>         on classpath, will use command-line arguments only
>         11/02/23 19:04:12 INFO common.AbstractJob: Command line arguments:
>         {--endPhase=2147483647, --itemFeatures=/tmp/als/out/M/,
>         --probes=/user/ubuntu/myout/probeSet/, --startPhase=0,
>         --tempDir=temp,
>         --userFeatures=/tmp/als/out/U/}
>         Exception in thread "main" java.lang.OutOfMemoryError: GC
>         overhead limit
>         exceeded
>                  at
>         org.apache.mahout.math.map.OpenIntDoubleHashMap.rehash(OpenIntDoubleHashMap.java:433)
>                  at
>         org.apache.mahout.math.map.OpenIntDoubleHashMap.put(OpenIntDoubleHashMap.java:387)
>                  at
>         org.apache.mahout.math.RandomAccessSparseVector.setQuick(RandomAccessSparseVector.java:134)
>                  at
>         org.apache.mahout.math.VectorWritable.readFields(VectorWritable.java:113)
>                  at
>         org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1751)
>                  at
>         org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1879)
>                  at
>         org.apache.mahout.utils.eval.ALSEvaluator.readMatrix(ALSEvaluator.java:113)
>                  at
>         org.apache.mahout.utils.eval.ALSEvaluator.run(ALSEvaluator.java:71)
>                  at
>         org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>                  at
>         org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>                  at
>         org.apache.mahout.utils.eval.ALSEvaluator.main(ALSEvaluator.java:52)
>                  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>         Method)
>                  at
>         sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>                  at
>         sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>                  at java.lang.reflect.Method.invoke(Method.java:616)
>                  at
>         org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>                  at
>         org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>                  at
>         org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:174)
>                  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>         Method)
>                  at
>         sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>                  at
>         sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>                  at java.lang.reflect.Method.invoke(Method.java:616)
>                  at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
>
>         There is no related exception in the Hadoop logs.
>
>         I am running with java child opts of -Xmx2048M.
>
>         Do you have any tips for me? Do you want me to post this into the
>         Mahout-542 newsgroup?
>
>         thanks,
>
>
>         DB
>
>
>