You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Nick Pentreath <ni...@gmail.com> on 2014/11/26 13:04:06 UTC

Re: RMSE in MovieLensALS increases or stays stable as iterations increase.

copying user group - I keep replying directly vs reply all :)

On Wed, Nov 26, 2014 at 2:03 PM, Nick Pentreath <ni...@gmail.com>
wrote:

> ALS will be guaranteed to decrease the squared error (therefore RMSE) in
> each iteration, on the *training* set.
>
> This does not hold for the *test* set / cross validation. You would
> expect the test set RMSE to stabilise as iterations increase, since the
> algorithm converges - but not necessarily to decrease.
>
> On Wed, Nov 26, 2014 at 1:57 PM, Kostas Kloudas <kk...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> I am getting familiarized with Mllib and a thing I noticed is that
>> running the MovieLensALS
>> example on the movieLens dataset for increasing number of iterations does
>> not decrease the
>> rmse.
>>
>> The results for 0.6% training set and 0.4% test are below. For training
>> set to 0.8%, the results
>> are almost identical. Shouldn’t it be normal to see a decreasing error?
>> Especially going from 1 to 5 iterations.
>>
>> Running 1 iterations
>> Test RMSE for 1 iter. = 1.2452964343277886 (52.757125927000004 s).
>> Running 5 iterations
>> Test RMSE for 5 iter. = 1.3258973764470259 (61.183927666 s).
>> Running 9 iterations
>> Test RMSE for 9 iter. = 1.3260308117704385 (61.84948875800001 s).
>> Running 13 iterations
>> Test RMSE for 13 iter. = 1.3260310099809915 (73.799510125 s).
>> Running 17 iterations
>> Test RMSE for 17 iter. = 1.3260310102735398 (77.56512185300001 s).
>> Running 21 iterations
>> Test RMSE for 21 iter. = 1.3260310102739703 (79.607495074 s).
>> Running 25 iterations
>> Test RMSE for 25 iter. = 1.326031010273971 (88.631776301 s).
>> Running 29 iterations
>> Test RMSE for 29 iter. = 1.3260310102739712 (101.178383079 s).
>>
>> Thanks  a lot,
>> Kostas
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>

Re: RMSE in MovieLensALS increases or stays stable as iterations increase.

Posted by Sean Owen <so...@cloudera.com>.

Ah of course. Great explanation. So I suppose you should see desired
results with lambda = 0, although you don't generally want to set this
to 0.

On Wed, Nov 26, 2014 at 7:53 PM, Xiangrui Meng <me...@gmail.com> wrote:
> The training RMSE may increase due to regularization. Squared loss
> only represents part of the global loss. If you watch the sum of the
> squared loss and the regularization, it should be non-increasing.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: RMSE in MovieLensALS increases or stays stable as iterations increase.

Posted by Debasish Das <de...@gmail.com>.

Running with lambda=0 fails the ALS code since the matrices no longer stays
positive def and cholesky fails...

Run with a very low lambda (I tested with 1e-4) and you should see the
decrease in RMSE as you expect...

On Thu, Nov 27, 2014 at 3:04 AM, Kostas Kloudas <kk...@gmail.com> wrote:

> Thanks a lot for your time guys and your quick replies!
>
> > On Nov 26, 2014, at 7:53 PM, Xiangrui Meng <me...@gmail.com> wrote:
> >
> > The training RMSE may increase due to regularization. Squared loss
> > only represents part of the global loss. If you watch the sum of the
> > squared loss and the regularization, it should be non-increasing.
> > -Xiangrui
> >
> > On Wed, Nov 26, 2014 at 9:53 AM, Sean Owen <so...@cloudera.com> wrote:
> >> I also modified the example to try 1, 5, 9, ... iterations as you did,
> >> and also ran with the same default parameters. I used the
> >> sample_movielens_data.txt file. Is that what you're using?
> >>
> >> My result is:
> >>
> >> Iteration 1 Test RMSE = 1.426079653593016 Train RMSE =
> 1.5013155094216357
> >> Iteration 5 Test RMSE = 1.405598012724468 Train RMSE =
> 1.4847078708333596
> >> Iteration 9 Test RMSE = 1.4055990901261632 Train RMSE =
> 1.484713206769993
> >> Iteration 13 Test RMSE = 1.4055990999738366 Train RMSE =
> 1.4847132332994588
> >> Iteration 17 Test RMSE = 1.40559910003368 Train RMSE = 1.48471323345531
> >> Iteration 21 Test RMSE = 1.4055991000342158 Train RMSE =
> 1.4847132334567061
> >> Iteration 25 Test RMSE = 1.4055991000342174 Train RMSE =
> 1.4847132334567108
> >>
> >> Train error is higher than test error, consistently, which could be
> >> underfitting. A higher rank=50 gets a reasonable result:
> >>
> >> Iteration 1 Test RMSE = 1.5981883186995312 Train RMSE =
> 1.4841671360432005
> >> Iteration 5 Test RMSE = 1.5745145659678204 Train RMSE =
> 1.4672341345080382
> >> Iteration 9 Test RMSE = 1.5745147110505406 Train RMSE =
> 1.4672385714907996
> >> Iteration 13 Test RMSE = 1.5745147108258577 Train RMSE =
> 1.4672385929631868
> >> Iteration 17 Test RMSE = 1.5745147108246424 Train RMSE =
> 1.4672385930428344
> >> Iteration 21 Test RMSE = 1.5745147108246367 Train RMSE =
> 1.4672385930431973
> >> Iteration 25 Test RMSE = 1.5745147108246367 Train RMSE =
> 1.467238593043199
> >>
> >> I'm not sure what the difference is. I looked at your modifications
> >> and they seem very similar. Is it the data you're using?
> >>
> >>
> >> On Wed, Nov 26, 2014 at 3:34 PM, Kostas Kloudas <kk...@gmail.com>
> wrote:
> >>> For the training I am using the code in the MovieLensALS example with
> trainImplicit set to false
> >>> and for the training RMSE I use the
> >>>
> >>> val rmseTr = computeRmse(model, training, params.implicitPrefs).
> >>>
> >>> The computeRmse() method is provided in the MovieLensALS class.
> >>>
> >>>
> >>> Thanks a lot,
> >>> Kostas
> >>>
> >>>
> >>>> On Nov 26, 2014, at 2:41 PM, Sean Owen <so...@cloudera.com> wrote:
> >>>>
> >>>> How are you computing RMSE?
> >>>> and how are you training the model -- not with trainImplicit right?
> >>>> I wonder if you are somehow optimizing something besides RMSE.
> >>>>
> >>>> On Wed, Nov 26, 2014 at 2:36 PM, Kostas Kloudas <kk...@gmail.com>
> wrote:
> >>>>> Once again, the error even with the training dataset increases. The
> results
> >>>>> are:
> >>>>>
> >>>>> Running 1 iterations
> >>>>> For 1 iter.: Test RMSE  = 1.2447121194304893  Training RMSE =
> >>>>> 1.2394166987104076 (34.751317636 s).
> >>>>> Running 5 iterations
> >>>>> For 5 iter.: Test RMSE  = 1.3253957117600659  Training RMSE =
> >>>>> 1.3206317416138509 (37.693118023000004 s).
> >>>>> Running 9 iterations
> >>>>> For 9 iter.: Test RMSE  = 1.3255293380139364  Training RMSE =
> >>>>> 1.3207661218210436 (41.046175661 s).
> >>>>> Running 13 iterations
> >>>>> For 13 iter.: Test RMSE  = 1.3255295352665748  Training RMSE =
> >>>>> 1.3207663201865092 (47.763619515 s).
> >>>>> Running 17 iterations
> >>>>> For 17 iter.: Test RMSE  = 1.32552953555787  Training RMSE =
> >>>>> 1.3207663204794406 (59.682361103000005 s).
> >>>>> Running 21 iterations
> >>>>> For 21 iter.: Test RMSE  = 1.3255295355583026  Training RMSE =
> >>>>> 1.3207663204798756 (57.210578232 s).
> >>>>> Running 25 iterations
> >>>>> For 25 iter.: Test RMSE  = 1.325529535558303  Training RMSE =
> >>>>> 1.3207663204798765 (65.785485882 s).
> >>>>>
> >>>>> Thanks a lot,
> >>>>> Kostas
> >>>>>
> >>>>> On Nov 26, 2014, at 12:04 PM, Nick Pentreath <
> nick.pentreath@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>> copying user group - I keep replying directly vs reply all :)
> >>>>>
> >>>>> On Wed, Nov 26, 2014 at 2:03 PM, Nick Pentreath <
> nick.pentreath@gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> ALS will be guaranteed to decrease the squared error (therefore
> RMSE) in
> >>>>>> each iteration, on the training set.
> >>>>>>
> >>>>>> This does not hold for the test set / cross validation. You would
> expect
> >>>>>> the test set RMSE to stabilise as iterations increase, since the
> algorithm
> >>>>>> converges - but not necessarily to decrease.
> >>>>>>
> >>>>>> On Wed, Nov 26, 2014 at 1:57 PM, Kostas Kloudas <kkloudas@gmail.com
> >
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> I am getting familiarized with Mllib and a thing I noticed is that
> >>>>>>> running the MovieLensALS
> >>>>>>> example on the movieLens dataset for increasing number of
> iterations does
> >>>>>>> not decrease the
> >>>>>>> rmse.
> >>>>>>>
> >>>>>>> The results for 0.6% training set and 0.4% test are below. For
> training
> >>>>>>> set to 0.8%, the results
> >>>>>>> are almost identical. Shouldn’t it be normal to see a decreasing
> error?
> >>>>>>> Especially going from 1 to 5 iterations.
> >>>>>>>
> >>>>>>> Running 1 iterations
> >>>>>>> Test RMSE for 1 iter. = 1.2452964343277886 (52.757125927000004 s).
> >>>>>>> Running 5 iterations
> >>>>>>> Test RMSE for 5 iter. = 1.3258973764470259 (61.183927666 s).
> >>>>>>> Running 9 iterations
> >>>>>>> Test RMSE for 9 iter. = 1.3260308117704385 (61.84948875800001 s).
> >>>>>>> Running 13 iterations
> >>>>>>> Test RMSE for 13 iter. = 1.3260310099809915 (73.799510125 s).
> >>>>>>> Running 17 iterations
> >>>>>>> Test RMSE for 17 iter. = 1.3260310102735398 (77.56512185300001 s).
> >>>>>>> Running 21 iterations
> >>>>>>> Test RMSE for 21 iter. = 1.3260310102739703 (79.607495074 s).
> >>>>>>> Running 25 iterations
> >>>>>>> Test RMSE for 25 iter. = 1.326031010273971 (88.631776301 s).
> >>>>>>> Running 29 iterations
> >>>>>>> Test RMSE for 29 iter. = 1.3260310102739712 (101.178383079 s).
> >>>>>>>
> >>>>>>> Thanks  a lot,
> >>>>>>> Kostas
> >>>>>>>
> ---------------------------------------------------------------------
> >>>>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> >>>>>>> For additional commands, e-mail: user-help@spark.apache.org
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> >> For additional commands, e-mail: user-help@spark.apache.org
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Re: RMSE in MovieLensALS increases or stays stable as iterations increase.

Posted by Kostas Kloudas <kk...@gmail.com>.

Thanks a lot for your time guys and your quick replies!

> On Nov 26, 2014, at 7:53 PM, Xiangrui Meng <me...@gmail.com> wrote:
> 
> The training RMSE may increase due to regularization. Squared loss
> only represents part of the global loss. If you watch the sum of the
> squared loss and the regularization, it should be non-increasing.
> -Xiangrui
> 
> On Wed, Nov 26, 2014 at 9:53 AM, Sean Owen <so...@cloudera.com> wrote:
>> I also modified the example to try 1, 5, 9, ... iterations as you did,
>> and also ran with the same default parameters. I used the
>> sample_movielens_data.txt file. Is that what you're using?
>> 
>> My result is:
>> 
>> Iteration 1 Test RMSE = 1.426079653593016 Train RMSE = 1.5013155094216357
>> Iteration 5 Test RMSE = 1.405598012724468 Train RMSE = 1.4847078708333596
>> Iteration 9 Test RMSE = 1.4055990901261632 Train RMSE = 1.484713206769993
>> Iteration 13 Test RMSE = 1.4055990999738366 Train RMSE = 1.4847132332994588
>> Iteration 17 Test RMSE = 1.40559910003368 Train RMSE = 1.48471323345531
>> Iteration 21 Test RMSE = 1.4055991000342158 Train RMSE = 1.4847132334567061
>> Iteration 25 Test RMSE = 1.4055991000342174 Train RMSE = 1.4847132334567108
>> 
>> Train error is higher than test error, consistently, which could be
>> underfitting. A higher rank=50 gets a reasonable result:
>> 
>> Iteration 1 Test RMSE = 1.5981883186995312 Train RMSE = 1.4841671360432005
>> Iteration 5 Test RMSE = 1.5745145659678204 Train RMSE = 1.4672341345080382
>> Iteration 9 Test RMSE = 1.5745147110505406 Train RMSE = 1.4672385714907996
>> Iteration 13 Test RMSE = 1.5745147108258577 Train RMSE = 1.4672385929631868
>> Iteration 17 Test RMSE = 1.5745147108246424 Train RMSE = 1.4672385930428344
>> Iteration 21 Test RMSE = 1.5745147108246367 Train RMSE = 1.4672385930431973
>> Iteration 25 Test RMSE = 1.5745147108246367 Train RMSE = 1.467238593043199
>> 
>> I'm not sure what the difference is. I looked at your modifications
>> and they seem very similar. Is it the data you're using?
>> 
>> 
>> On Wed, Nov 26, 2014 at 3:34 PM, Kostas Kloudas <kk...@gmail.com> wrote:
>>> For the training I am using the code in the MovieLensALS example with trainImplicit set to false
>>> and for the training RMSE I use the
>>> 
>>> val rmseTr = computeRmse(model, training, params.implicitPrefs).
>>> 
>>> The computeRmse() method is provided in the MovieLensALS class.
>>> 
>>> 
>>> Thanks a lot,
>>> Kostas
>>> 
>>> 
>>>> On Nov 26, 2014, at 2:41 PM, Sean Owen <so...@cloudera.com> wrote:
>>>> 
>>>> How are you computing RMSE?
>>>> and how are you training the model -- not with trainImplicit right?
>>>> I wonder if you are somehow optimizing something besides RMSE.
>>>> 
>>>> On Wed, Nov 26, 2014 at 2:36 PM, Kostas Kloudas <kk...@gmail.com> wrote:
>>>>> Once again, the error even with the training dataset increases. The results
>>>>> are:
>>>>> 
>>>>> Running 1 iterations
>>>>> For 1 iter.: Test RMSE  = 1.2447121194304893  Training RMSE =
>>>>> 1.2394166987104076 (34.751317636 s).
>>>>> Running 5 iterations
>>>>> For 5 iter.: Test RMSE  = 1.3253957117600659  Training RMSE =
>>>>> 1.3206317416138509 (37.693118023000004 s).
>>>>> Running 9 iterations
>>>>> For 9 iter.: Test RMSE  = 1.3255293380139364  Training RMSE =
>>>>> 1.3207661218210436 (41.046175661 s).
>>>>> Running 13 iterations
>>>>> For 13 iter.: Test RMSE  = 1.3255295352665748  Training RMSE =
>>>>> 1.3207663201865092 (47.763619515 s).
>>>>> Running 17 iterations
>>>>> For 17 iter.: Test RMSE  = 1.32552953555787  Training RMSE =
>>>>> 1.3207663204794406 (59.682361103000005 s).
>>>>> Running 21 iterations
>>>>> For 21 iter.: Test RMSE  = 1.3255295355583026  Training RMSE =
>>>>> 1.3207663204798756 (57.210578232 s).
>>>>> Running 25 iterations
>>>>> For 25 iter.: Test RMSE  = 1.325529535558303  Training RMSE =
>>>>> 1.3207663204798765 (65.785485882 s).
>>>>> 
>>>>> Thanks a lot,
>>>>> Kostas
>>>>> 
>>>>> On Nov 26, 2014, at 12:04 PM, Nick Pentreath <ni...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>> copying user group - I keep replying directly vs reply all :)
>>>>> 
>>>>> On Wed, Nov 26, 2014 at 2:03 PM, Nick Pentreath <ni...@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> ALS will be guaranteed to decrease the squared error (therefore RMSE) in
>>>>>> each iteration, on the training set.
>>>>>> 
>>>>>> This does not hold for the test set / cross validation. You would expect
>>>>>> the test set RMSE to stabilise as iterations increase, since the algorithm
>>>>>> converges - but not necessarily to decrease.
>>>>>> 
>>>>>> On Wed, Nov 26, 2014 at 1:57 PM, Kostas Kloudas <kk...@gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> I am getting familiarized with Mllib and a thing I noticed is that
>>>>>>> running the MovieLensALS
>>>>>>> example on the movieLens dataset for increasing number of iterations does
>>>>>>> not decrease the
>>>>>>> rmse.
>>>>>>> 
>>>>>>> The results for 0.6% training set and 0.4% test are below. For training
>>>>>>> set to 0.8%, the results
>>>>>>> are almost identical. Shouldn’t it be normal to see a decreasing error?
>>>>>>> Especially going from 1 to 5 iterations.
>>>>>>> 
>>>>>>> Running 1 iterations
>>>>>>> Test RMSE for 1 iter. = 1.2452964343277886 (52.757125927000004 s).
>>>>>>> Running 5 iterations
>>>>>>> Test RMSE for 5 iter. = 1.3258973764470259 (61.183927666 s).
>>>>>>> Running 9 iterations
>>>>>>> Test RMSE for 9 iter. = 1.3260308117704385 (61.84948875800001 s).
>>>>>>> Running 13 iterations
>>>>>>> Test RMSE for 13 iter. = 1.3260310099809915 (73.799510125 s).
>>>>>>> Running 17 iterations
>>>>>>> Test RMSE for 17 iter. = 1.3260310102735398 (77.56512185300001 s).
>>>>>>> Running 21 iterations
>>>>>>> Test RMSE for 21 iter. = 1.3260310102739703 (79.607495074 s).
>>>>>>> Running 25 iterations
>>>>>>> Test RMSE for 25 iter. = 1.326031010273971 (88.631776301 s).
>>>>>>> Running 29 iterations
>>>>>>> Test RMSE for 29 iter. = 1.3260310102739712 (101.178383079 s).
>>>>>>> 
>>>>>>> Thanks  a lot,
>>>>>>> Kostas
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: RMSE in MovieLensALS increases or stays stable as iterations increase.

Posted by Xiangrui Meng <me...@gmail.com>.

The training RMSE may increase due to regularization. Squared loss
only represents part of the global loss. If you watch the sum of the
squared loss and the regularization, it should be non-increasing.
-Xiangrui

On Wed, Nov 26, 2014 at 9:53 AM, Sean Owen <so...@cloudera.com> wrote:
> I also modified the example to try 1, 5, 9, ... iterations as you did,
> and also ran with the same default parameters. I used the
> sample_movielens_data.txt file. Is that what you're using?
>
> My result is:
>
> Iteration 1 Test RMSE = 1.426079653593016 Train RMSE = 1.5013155094216357
> Iteration 5 Test RMSE = 1.405598012724468 Train RMSE = 1.4847078708333596
> Iteration 9 Test RMSE = 1.4055990901261632 Train RMSE = 1.484713206769993
> Iteration 13 Test RMSE = 1.4055990999738366 Train RMSE = 1.4847132332994588
> Iteration 17 Test RMSE = 1.40559910003368 Train RMSE = 1.48471323345531
> Iteration 21 Test RMSE = 1.4055991000342158 Train RMSE = 1.4847132334567061
> Iteration 25 Test RMSE = 1.4055991000342174 Train RMSE = 1.4847132334567108
>
> Train error is higher than test error, consistently, which could be
> underfitting. A higher rank=50 gets a reasonable result:
>
> Iteration 1 Test RMSE = 1.5981883186995312 Train RMSE = 1.4841671360432005
> Iteration 5 Test RMSE = 1.5745145659678204 Train RMSE = 1.4672341345080382
> Iteration 9 Test RMSE = 1.5745147110505406 Train RMSE = 1.4672385714907996
> Iteration 13 Test RMSE = 1.5745147108258577 Train RMSE = 1.4672385929631868
> Iteration 17 Test RMSE = 1.5745147108246424 Train RMSE = 1.4672385930428344
> Iteration 21 Test RMSE = 1.5745147108246367 Train RMSE = 1.4672385930431973
> Iteration 25 Test RMSE = 1.5745147108246367 Train RMSE = 1.467238593043199
>
> I'm not sure what the difference is. I looked at your modifications
> and they seem very similar. Is it the data you're using?
>
>
> On Wed, Nov 26, 2014 at 3:34 PM, Kostas Kloudas <kk...@gmail.com> wrote:
>> For the training I am using the code in the MovieLensALS example with trainImplicit set to false
>> and for the training RMSE I use the
>>
>> val rmseTr = computeRmse(model, training, params.implicitPrefs).
>>
>> The computeRmse() method is provided in the MovieLensALS class.
>>
>>
>> Thanks a lot,
>> Kostas
>>
>>
>>> On Nov 26, 2014, at 2:41 PM, Sean Owen <so...@cloudera.com> wrote:
>>>
>>> How are you computing RMSE?
>>> and how are you training the model -- not with trainImplicit right?
>>> I wonder if you are somehow optimizing something besides RMSE.
>>>
>>> On Wed, Nov 26, 2014 at 2:36 PM, Kostas Kloudas <kk...@gmail.com> wrote:
>>>> Once again, the error even with the training dataset increases. The results
>>>> are:
>>>>
>>>> Running 1 iterations
>>>> For 1 iter.: Test RMSE  = 1.2447121194304893  Training RMSE =
>>>> 1.2394166987104076 (34.751317636 s).
>>>> Running 5 iterations
>>>> For 5 iter.: Test RMSE  = 1.3253957117600659  Training RMSE =
>>>> 1.3206317416138509 (37.693118023000004 s).
>>>> Running 9 iterations
>>>> For 9 iter.: Test RMSE  = 1.3255293380139364  Training RMSE =
>>>> 1.3207661218210436 (41.046175661 s).
>>>> Running 13 iterations
>>>> For 13 iter.: Test RMSE  = 1.3255295352665748  Training RMSE =
>>>> 1.3207663201865092 (47.763619515 s).
>>>> Running 17 iterations
>>>> For 17 iter.: Test RMSE  = 1.32552953555787  Training RMSE =
>>>> 1.3207663204794406 (59.682361103000005 s).
>>>> Running 21 iterations
>>>> For 21 iter.: Test RMSE  = 1.3255295355583026  Training RMSE =
>>>> 1.3207663204798756 (57.210578232 s).
>>>> Running 25 iterations
>>>> For 25 iter.: Test RMSE  = 1.325529535558303  Training RMSE =
>>>> 1.3207663204798765 (65.785485882 s).
>>>>
>>>> Thanks a lot,
>>>> Kostas
>>>>
>>>> On Nov 26, 2014, at 12:04 PM, Nick Pentreath <ni...@gmail.com>
>>>> wrote:
>>>>
>>>> copying user group - I keep replying directly vs reply all :)
>>>>
>>>> On Wed, Nov 26, 2014 at 2:03 PM, Nick Pentreath <ni...@gmail.com>
>>>> wrote:
>>>>>
>>>>> ALS will be guaranteed to decrease the squared error (therefore RMSE) in
>>>>> each iteration, on the training set.
>>>>>
>>>>> This does not hold for the test set / cross validation. You would expect
>>>>> the test set RMSE to stabilise as iterations increase, since the algorithm
>>>>> converges - but not necessarily to decrease.
>>>>>
>>>>> On Wed, Nov 26, 2014 at 1:57 PM, Kostas Kloudas <kk...@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I am getting familiarized with Mllib and a thing I noticed is that
>>>>>> running the MovieLensALS
>>>>>> example on the movieLens dataset for increasing number of iterations does
>>>>>> not decrease the
>>>>>> rmse.
>>>>>>
>>>>>> The results for 0.6% training set and 0.4% test are below. For training
>>>>>> set to 0.8%, the results
>>>>>> are almost identical. Shouldn’t it be normal to see a decreasing error?
>>>>>> Especially going from 1 to 5 iterations.
>>>>>>
>>>>>> Running 1 iterations
>>>>>> Test RMSE for 1 iter. = 1.2452964343277886 (52.757125927000004 s).
>>>>>> Running 5 iterations
>>>>>> Test RMSE for 5 iter. = 1.3258973764470259 (61.183927666 s).
>>>>>> Running 9 iterations
>>>>>> Test RMSE for 9 iter. = 1.3260308117704385 (61.84948875800001 s).
>>>>>> Running 13 iterations
>>>>>> Test RMSE for 13 iter. = 1.3260310099809915 (73.799510125 s).
>>>>>> Running 17 iterations
>>>>>> Test RMSE for 17 iter. = 1.3260310102735398 (77.56512185300001 s).
>>>>>> Running 21 iterations
>>>>>> Test RMSE for 21 iter. = 1.3260310102739703 (79.607495074 s).
>>>>>> Running 25 iterations
>>>>>> Test RMSE for 25 iter. = 1.326031010273971 (88.631776301 s).
>>>>>> Running 29 iterations
>>>>>> Test RMSE for 29 iter. = 1.3260310102739712 (101.178383079 s).
>>>>>>
>>>>>> Thanks  a lot,
>>>>>> Kostas
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>>>
>>>>>
>>>>
>>>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: RMSE in MovieLensALS increases or stays stable as iterations increase.

Posted by Sean Owen <so...@cloudera.com>.

I also modified the example to try 1, 5, 9, ... iterations as you did,
and also ran with the same default parameters. I used the
sample_movielens_data.txt file. Is that what you're using?

My result is:

Iteration 1 Test RMSE = 1.426079653593016 Train RMSE = 1.5013155094216357
Iteration 5 Test RMSE = 1.405598012724468 Train RMSE = 1.4847078708333596
Iteration 9 Test RMSE = 1.4055990901261632 Train RMSE = 1.484713206769993
Iteration 13 Test RMSE = 1.4055990999738366 Train RMSE = 1.4847132332994588
Iteration 17 Test RMSE = 1.40559910003368 Train RMSE = 1.48471323345531
Iteration 21 Test RMSE = 1.4055991000342158 Train RMSE = 1.4847132334567061
Iteration 25 Test RMSE = 1.4055991000342174 Train RMSE = 1.4847132334567108

Train error is higher than test error, consistently, which could be
underfitting. A higher rank=50 gets a reasonable result:

Iteration 1 Test RMSE = 1.5981883186995312 Train RMSE = 1.4841671360432005
Iteration 5 Test RMSE = 1.5745145659678204 Train RMSE = 1.4672341345080382
Iteration 9 Test RMSE = 1.5745147110505406 Train RMSE = 1.4672385714907996
Iteration 13 Test RMSE = 1.5745147108258577 Train RMSE = 1.4672385929631868
Iteration 17 Test RMSE = 1.5745147108246424 Train RMSE = 1.4672385930428344
Iteration 21 Test RMSE = 1.5745147108246367 Train RMSE = 1.4672385930431973
Iteration 25 Test RMSE = 1.5745147108246367 Train RMSE = 1.467238593043199

I'm not sure what the difference is. I looked at your modifications
and they seem very similar. Is it the data you're using?


On Wed, Nov 26, 2014 at 3:34 PM, Kostas Kloudas <kk...@gmail.com> wrote:
> For the training I am using the code in the MovieLensALS example with trainImplicit set to false
> and for the training RMSE I use the
>
> val rmseTr = computeRmse(model, training, params.implicitPrefs).
>
> The computeRmse() method is provided in the MovieLensALS class.
>
>
> Thanks a lot,
> Kostas
>
>
>> On Nov 26, 2014, at 2:41 PM, Sean Owen <so...@cloudera.com> wrote:
>>
>> How are you computing RMSE?
>> and how are you training the model -- not with trainImplicit right?
>> I wonder if you are somehow optimizing something besides RMSE.
>>
>> On Wed, Nov 26, 2014 at 2:36 PM, Kostas Kloudas <kk...@gmail.com> wrote:
>>> Once again, the error even with the training dataset increases. The results
>>> are:
>>>
>>> Running 1 iterations
>>> For 1 iter.: Test RMSE  = 1.2447121194304893  Training RMSE =
>>> 1.2394166987104076 (34.751317636 s).
>>> Running 5 iterations
>>> For 5 iter.: Test RMSE  = 1.3253957117600659  Training RMSE =
>>> 1.3206317416138509 (37.693118023000004 s).
>>> Running 9 iterations
>>> For 9 iter.: Test RMSE  = 1.3255293380139364  Training RMSE =
>>> 1.3207661218210436 (41.046175661 s).
>>> Running 13 iterations
>>> For 13 iter.: Test RMSE  = 1.3255295352665748  Training RMSE =
>>> 1.3207663201865092 (47.763619515 s).
>>> Running 17 iterations
>>> For 17 iter.: Test RMSE  = 1.32552953555787  Training RMSE =
>>> 1.3207663204794406 (59.682361103000005 s).
>>> Running 21 iterations
>>> For 21 iter.: Test RMSE  = 1.3255295355583026  Training RMSE =
>>> 1.3207663204798756 (57.210578232 s).
>>> Running 25 iterations
>>> For 25 iter.: Test RMSE  = 1.325529535558303  Training RMSE =
>>> 1.3207663204798765 (65.785485882 s).
>>>
>>> Thanks a lot,
>>> Kostas
>>>
>>> On Nov 26, 2014, at 12:04 PM, Nick Pentreath <ni...@gmail.com>
>>> wrote:
>>>
>>> copying user group - I keep replying directly vs reply all :)
>>>
>>> On Wed, Nov 26, 2014 at 2:03 PM, Nick Pentreath <ni...@gmail.com>
>>> wrote:
>>>>
>>>> ALS will be guaranteed to decrease the squared error (therefore RMSE) in
>>>> each iteration, on the training set.
>>>>
>>>> This does not hold for the test set / cross validation. You would expect
>>>> the test set RMSE to stabilise as iterations increase, since the algorithm
>>>> converges - but not necessarily to decrease.
>>>>
>>>> On Wed, Nov 26, 2014 at 1:57 PM, Kostas Kloudas <kk...@gmail.com>
>>>> wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I am getting familiarized with Mllib and a thing I noticed is that
>>>>> running the MovieLensALS
>>>>> example on the movieLens dataset for increasing number of iterations does
>>>>> not decrease the
>>>>> rmse.
>>>>>
>>>>> The results for 0.6% training set and 0.4% test are below. For training
>>>>> set to 0.8%, the results
>>>>> are almost identical. Shouldn’t it be normal to see a decreasing error?
>>>>> Especially going from 1 to 5 iterations.
>>>>>
>>>>> Running 1 iterations
>>>>> Test RMSE for 1 iter. = 1.2452964343277886 (52.757125927000004 s).
>>>>> Running 5 iterations
>>>>> Test RMSE for 5 iter. = 1.3258973764470259 (61.183927666 s).
>>>>> Running 9 iterations
>>>>> Test RMSE for 9 iter. = 1.3260308117704385 (61.84948875800001 s).
>>>>> Running 13 iterations
>>>>> Test RMSE for 13 iter. = 1.3260310099809915 (73.799510125 s).
>>>>> Running 17 iterations
>>>>> Test RMSE for 17 iter. = 1.3260310102735398 (77.56512185300001 s).
>>>>> Running 21 iterations
>>>>> Test RMSE for 21 iter. = 1.3260310102739703 (79.607495074 s).
>>>>> Running 25 iterations
>>>>> Test RMSE for 25 iter. = 1.326031010273971 (88.631776301 s).
>>>>> Running 29 iterations
>>>>> Test RMSE for 29 iter. = 1.3260310102739712 (101.178383079 s).
>>>>>
>>>>> Thanks  a lot,
>>>>> Kostas
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>>
>>>>
>>>
>>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: RMSE in MovieLensALS increases or stays stable as iterations increase.

Posted by Kostas Kloudas <kk...@gmail.com>.

For the training I am using the code in the MovieLensALS example with trainImplicit set to false 
and for the training RMSE I use the

val rmseTr = computeRmse(model, training, params.implicitPrefs).

The computeRmse() method is provided in the MovieLensALS class.


Thanks a lot, 
Kostas


> On Nov 26, 2014, at 2:41 PM, Sean Owen <so...@cloudera.com> wrote:
> 
> How are you computing RMSE?
> and how are you training the model -- not with trainImplicit right?
> I wonder if you are somehow optimizing something besides RMSE.
> 
> On Wed, Nov 26, 2014 at 2:36 PM, Kostas Kloudas <kk...@gmail.com> wrote:
>> Once again, the error even with the training dataset increases. The results
>> are:
>> 
>> Running 1 iterations
>> For 1 iter.: Test RMSE  = 1.2447121194304893  Training RMSE =
>> 1.2394166987104076 (34.751317636 s).
>> Running 5 iterations
>> For 5 iter.: Test RMSE  = 1.3253957117600659  Training RMSE =
>> 1.3206317416138509 (37.693118023000004 s).
>> Running 9 iterations
>> For 9 iter.: Test RMSE  = 1.3255293380139364  Training RMSE =
>> 1.3207661218210436 (41.046175661 s).
>> Running 13 iterations
>> For 13 iter.: Test RMSE  = 1.3255295352665748  Training RMSE =
>> 1.3207663201865092 (47.763619515 s).
>> Running 17 iterations
>> For 17 iter.: Test RMSE  = 1.32552953555787  Training RMSE =
>> 1.3207663204794406 (59.682361103000005 s).
>> Running 21 iterations
>> For 21 iter.: Test RMSE  = 1.3255295355583026  Training RMSE =
>> 1.3207663204798756 (57.210578232 s).
>> Running 25 iterations
>> For 25 iter.: Test RMSE  = 1.325529535558303  Training RMSE =
>> 1.3207663204798765 (65.785485882 s).
>> 
>> Thanks a lot,
>> Kostas
>> 
>> On Nov 26, 2014, at 12:04 PM, Nick Pentreath <ni...@gmail.com>
>> wrote:
>> 
>> copying user group - I keep replying directly vs reply all :)
>> 
>> On Wed, Nov 26, 2014 at 2:03 PM, Nick Pentreath <ni...@gmail.com>
>> wrote:
>>> 
>>> ALS will be guaranteed to decrease the squared error (therefore RMSE) in
>>> each iteration, on the training set.
>>> 
>>> This does not hold for the test set / cross validation. You would expect
>>> the test set RMSE to stabilise as iterations increase, since the algorithm
>>> converges - but not necessarily to decrease.
>>> 
>>> On Wed, Nov 26, 2014 at 1:57 PM, Kostas Kloudas <kk...@gmail.com>
>>> wrote:
>>>> 
>>>> Hi all,
>>>> 
>>>> I am getting familiarized with Mllib and a thing I noticed is that
>>>> running the MovieLensALS
>>>> example on the movieLens dataset for increasing number of iterations does
>>>> not decrease the
>>>> rmse.
>>>> 
>>>> The results for 0.6% training set and 0.4% test are below. For training
>>>> set to 0.8%, the results
>>>> are almost identical. Shouldn’t it be normal to see a decreasing error?
>>>> Especially going from 1 to 5 iterations.
>>>> 
>>>> Running 1 iterations
>>>> Test RMSE for 1 iter. = 1.2452964343277886 (52.757125927000004 s).
>>>> Running 5 iterations
>>>> Test RMSE for 5 iter. = 1.3258973764470259 (61.183927666 s).
>>>> Running 9 iterations
>>>> Test RMSE for 9 iter. = 1.3260308117704385 (61.84948875800001 s).
>>>> Running 13 iterations
>>>> Test RMSE for 13 iter. = 1.3260310099809915 (73.799510125 s).
>>>> Running 17 iterations
>>>> Test RMSE for 17 iter. = 1.3260310102735398 (77.56512185300001 s).
>>>> Running 21 iterations
>>>> Test RMSE for 21 iter. = 1.3260310102739703 (79.607495074 s).
>>>> Running 25 iterations
>>>> Test RMSE for 25 iter. = 1.326031010273971 (88.631776301 s).
>>>> Running 29 iterations
>>>> Test RMSE for 29 iter. = 1.3260310102739712 (101.178383079 s).
>>>> 
>>>> Thanks  a lot,
>>>> Kostas
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>> 
>>> 
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: RMSE in MovieLensALS increases or stays stable as iterations increase.

Posted by Sean Owen <so...@cloudera.com>.

How are you computing RMSE?
and how are you training the model -- not with trainImplicit right?
I wonder if you are somehow optimizing something besides RMSE.

On Wed, Nov 26, 2014 at 2:36 PM, Kostas Kloudas <kk...@gmail.com> wrote:
> Once again, the error even with the training dataset increases. The results
> are:
>
> Running 1 iterations
> For 1 iter.: Test RMSE  = 1.2447121194304893  Training RMSE =
> 1.2394166987104076 (34.751317636 s).
> Running 5 iterations
> For 5 iter.: Test RMSE  = 1.3253957117600659  Training RMSE =
> 1.3206317416138509 (37.693118023000004 s).
> Running 9 iterations
> For 9 iter.: Test RMSE  = 1.3255293380139364  Training RMSE =
> 1.3207661218210436 (41.046175661 s).
> Running 13 iterations
> For 13 iter.: Test RMSE  = 1.3255295352665748  Training RMSE =
> 1.3207663201865092 (47.763619515 s).
> Running 17 iterations
> For 17 iter.: Test RMSE  = 1.32552953555787  Training RMSE =
> 1.3207663204794406 (59.682361103000005 s).
> Running 21 iterations
> For 21 iter.: Test RMSE  = 1.3255295355583026  Training RMSE =
> 1.3207663204798756 (57.210578232 s).
> Running 25 iterations
> For 25 iter.: Test RMSE  = 1.325529535558303  Training RMSE =
> 1.3207663204798765 (65.785485882 s).
>
> Thanks a lot,
> Kostas
>
> On Nov 26, 2014, at 12:04 PM, Nick Pentreath <ni...@gmail.com>
> wrote:
>
> copying user group - I keep replying directly vs reply all :)
>
> On Wed, Nov 26, 2014 at 2:03 PM, Nick Pentreath <ni...@gmail.com>
> wrote:
>>
>> ALS will be guaranteed to decrease the squared error (therefore RMSE) in
>> each iteration, on the training set.
>>
>> This does not hold for the test set / cross validation. You would expect
>> the test set RMSE to stabilise as iterations increase, since the algorithm
>> converges - but not necessarily to decrease.
>>
>> On Wed, Nov 26, 2014 at 1:57 PM, Kostas Kloudas <kk...@gmail.com>
>> wrote:
>>>
>>> Hi all,
>>>
>>> I am getting familiarized with Mllib and a thing I noticed is that
>>> running the MovieLensALS
>>> example on the movieLens dataset for increasing number of iterations does
>>> not decrease the
>>> rmse.
>>>
>>> The results for 0.6% training set and 0.4% test are below. For training
>>> set to 0.8%, the results
>>> are almost identical. Shouldn’t it be normal to see a decreasing error?
>>> Especially going from 1 to 5 iterations.
>>>
>>> Running 1 iterations
>>> Test RMSE for 1 iter. = 1.2452964343277886 (52.757125927000004 s).
>>> Running 5 iterations
>>> Test RMSE for 5 iter. = 1.3258973764470259 (61.183927666 s).
>>> Running 9 iterations
>>> Test RMSE for 9 iter. = 1.3260308117704385 (61.84948875800001 s).
>>> Running 13 iterations
>>> Test RMSE for 13 iter. = 1.3260310099809915 (73.799510125 s).
>>> Running 17 iterations
>>> Test RMSE for 17 iter. = 1.3260310102735398 (77.56512185300001 s).
>>> Running 21 iterations
>>> Test RMSE for 21 iter. = 1.3260310102739703 (79.607495074 s).
>>> Running 25 iterations
>>> Test RMSE for 25 iter. = 1.326031010273971 (88.631776301 s).
>>> Running 29 iterations
>>> Test RMSE for 29 iter. = 1.3260310102739712 (101.178383079 s).
>>>
>>> Thanks  a lot,
>>> Kostas
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: RMSE in MovieLensALS increases or stays stable as iterations increase.

Posted by Kostas Kloudas <kk...@gmail.com>.

Once again, the error even with the training dataset increases. The results are:

Running 1 iterations
For 1 iter.: Test RMSE  = 1.2447121194304893  Training RMSE = 1.2394166987104076 (34.751317636 s).
Running 5 iterations
For 5 iter.: Test RMSE  = 1.3253957117600659  Training RMSE = 1.3206317416138509 (37.693118023000004 s).
Running 9 iterations
For 9 iter.: Test RMSE  = 1.3255293380139364  Training RMSE = 1.3207661218210436 (41.046175661 s).
Running 13 iterations
For 13 iter.: Test RMSE  = 1.3255295352665748  Training RMSE = 1.3207663201865092 (47.763619515 s).
Running 17 iterations
For 17 iter.: Test RMSE  = 1.32552953555787  Training RMSE = 1.3207663204794406 (59.682361103000005 s).
Running 21 iterations
For 21 iter.: Test RMSE  = 1.3255295355583026  Training RMSE = 1.3207663204798756 (57.210578232 s).
Running 25 iterations
For 25 iter.: Test RMSE  = 1.325529535558303  Training RMSE = 1.3207663204798765 (65.785485882 s).

Thanks a lot,
Kostas

> On Nov 26, 2014, at 12:04 PM, Nick Pentreath <ni...@gmail.com> wrote:
> 
> copying user group - I keep replying directly vs reply all :)
> 
> On Wed, Nov 26, 2014 at 2:03 PM, Nick Pentreath <nick.pentreath@gmail.com <ma...@gmail.com>> wrote:
> ALS will be guaranteed to decrease the squared error (therefore RMSE) in each iteration, on the training set. 
> 
> This does not hold for the test set / cross validation. You would expect the test set RMSE to stabilise as iterations increase, since the algorithm converges - but not necessarily to decrease.
> 
> On Wed, Nov 26, 2014 at 1:57 PM, Kostas Kloudas <kkloudas@gmail.com <ma...@gmail.com>> wrote:
> Hi all,
> 
> I am getting familiarized with Mllib and a thing I noticed is that running the MovieLensALS
> example on the movieLens dataset for increasing number of iterations does not decrease the
> rmse.
> 
> The results for 0.6% training set and 0.4% test are below. For training set to 0.8%, the results
> are almost identical. Shouldn’t it be normal to see a decreasing error?
> Especially going from 1 to 5 iterations.
> 
> Running 1 iterations
> Test RMSE for 1 iter. = 1.2452964343277886 (52.757125927000004 s).
> Running 5 iterations
> Test RMSE for 5 iter. = 1.3258973764470259 (61.183927666 s).
> Running 9 iterations
> Test RMSE for 9 iter. = 1.3260308117704385 (61.84948875800001 s).
> Running 13 iterations
> Test RMSE for 13 iter. = 1.3260310099809915 (73.799510125 s).
> Running 17 iterations
> Test RMSE for 17 iter. = 1.3260310102735398 (77.56512185300001 s).
> Running 21 iterations
> Test RMSE for 21 iter. = 1.3260310102739703 (79.607495074 s).
> Running 25 iterations
> Test RMSE for 25 iter. = 1.326031010273971 (88.631776301 s).
> Running 29 iterations
> Test RMSE for 29 iter. = 1.3260310102739712 (101.178383079 s).
> 
> Thanks  a lot,
> Kostas
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org <ma...@spark.apache.org>
> For additional commands, e-mail: user-help@spark.apache.org <ma...@spark.apache.org>
> 
> 
>