You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Pasquinell Urbani <pa...@exalitica.com> on 2016/09/14 19:33:28 UTC

RMSE in ALS

Hi Community

I'm performing an ALS for retail product recommendation. Right now I'm
reaching rms_test = 2.3 and rmse_test = 32.5. Is this too much in your
experience? Does the transformation of the ranking values important for
having good errors?

Thank you all.

Pasquinell Urbani

Re: RMSE in ALS

Posted by Sean Owen <so...@cloudera.com>.

Yes, that's what TF-IDF is, but it's just a statistic and not a
ranking. If you're using that to fill in a user-item matrix then that
is your model; you don't need ALS. Building an ALS model on this is
kind of like building a model on a model. Applying RMSE in this case
is a little funny, given the distribution of TF-IDF values. It's hard
to say what's normal but you're saying the test error is both 2.3 and
32.5. Regardless of which is really the test error it indicates
something is wrong with the modeling process. These ought not be too
different.

On Wed, Sep 14, 2016 at 9:22 PM, Pasquinell Urbani
<pa...@exalitica.com> wrote:
> The implicit rankings are the output of Tf-idf. I.e.:
> Each_ranking= frecuency of an ítem * log(amount of total customers/amount of
> customers buying the ítem)
>
>
> El 14 sept. 2016 17:14, "Sean Owen" <so...@cloudera.com> escribió:
>>
>> What are implicit rankings here?
>> RMSE would not be an appropriate measure for comparing rankings. There are
>> ranking metrics like mean average precision that would be appropriate
>> instead.
>>
>> On Wed, Sep 14, 2016 at 9:11 PM, Pasquinell Urbani
>> <pa...@exalitica.com> wrote:
>>>
>>> It was a typo mistake, both are rmse.
>>>
>>> The frecency distribution of rankings is the following
>>>
>>>
>>>
>>> As you can see, I have heavy tail, but the majority of the observations
>>> rely near ranking  5.
>>>
>>> I'm working with implicit rankings (generated by TF-IDF), can this affect
>>> the error? (I'm currently using trainImplicit in ALS, spark 1.6.2)
>>>
>>> Thank you.
>>>
>>>
>>>
>>> 2016-09-14 16:49 GMT-03:00 Sean Owen <so...@cloudera.com>:
>>>>
>>>> There is no way to answer this without knowing what your inputs are
>>>> like. If they're on the scale of thousands, that's small (good). If
>>>> they're on the scale of 1-5, that's extremely poor.
>>>>
>>>> What's RMS vs RMSE?
>>>>
>>>> On Wed, Sep 14, 2016 at 8:33 PM, Pasquinell Urbani
>>>> <pa...@exalitica.com> wrote:
>>>> > Hi Community
>>>> >
>>>> > I'm performing an ALS for retail product recommendation. Right now I'm
>>>> > reaching rms_test = 2.3 and rmse_test = 32.5. Is this too much in your
>>>> > experience? Does the transformation of the ranking values important
>>>> > for
>>>> > having good errors?
>>>> >
>>>> > Thank you all.
>>>> >
>>>> > Pasquinell Urbani
>>>
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: RMSE in ALS

Posted by Pasquinell Urbani <pa...@exalitica.com>.

The implicit rankings are the output of Tf-idf. I.e.:
Each_ranking= frecuency of an ítem * log(amount of total customers/amount
of customers buying the ítem)

El 14 sept. 2016 17:14, "Sean Owen" <so...@cloudera.com> escribió:

> What are implicit rankings here?
> RMSE would not be an appropriate measure for comparing rankings. There are
> ranking metrics like mean average precision that would be appropriate
> instead.
>
> On Wed, Sep 14, 2016 at 9:11 PM, Pasquinell Urbani <
> pasquinell.urbani@exalitica.com> wrote:
>
>> It was a typo mistake, both are rmse.
>>
>> The frecency distribution of rankings is the following
>>
>> [image: Imágenes integradas 2]
>>
>> As you can see, I have heavy tail, but the majority of the observations
>> rely near ranking  5.
>>
>> I'm working with implicit rankings (generated by TF-IDF), can this affect
>> the error? (I'm currently using trainImplicit in ALS, spark 1.6.2)
>>
>> Thank you.
>>
>>
>>
>> 2016-09-14 16:49 GMT-03:00 Sean Owen <so...@cloudera.com>:
>>
>>> There is no way to answer this without knowing what your inputs are
>>> like. If they're on the scale of thousands, that's small (good). If
>>> they're on the scale of 1-5, that's extremely poor.
>>>
>>> What's RMS vs RMSE?
>>>
>>> On Wed, Sep 14, 2016 at 8:33 PM, Pasquinell Urbani
>>> <pa...@exalitica.com> wrote:
>>> > Hi Community
>>> >
>>> > I'm performing an ALS for retail product recommendation. Right now I'm
>>> > reaching rms_test = 2.3 and rmse_test = 32.5. Is this too much in your
>>> > experience? Does the transformation of the ranking values important for
>>> > having good errors?
>>> >
>>> > Thank you all.
>>> >
>>> > Pasquinell Urbani
>>>
>>
>>
>

Re: RMSE in ALS

Posted by Sean Owen <so...@cloudera.com>.

What are implicit rankings here?
RMSE would not be an appropriate measure for comparing rankings. There are
ranking metrics like mean average precision that would be appropriate
instead.

On Wed, Sep 14, 2016 at 9:11 PM, Pasquinell Urbani <
pasquinell.urbani@exalitica.com> wrote:

> It was a typo mistake, both are rmse.
>
> The frecency distribution of rankings is the following
>
> [image: Imágenes integradas 2]
>
> As you can see, I have heavy tail, but the majority of the observations
> rely near ranking  5.
>
> I'm working with implicit rankings (generated by TF-IDF), can this affect
> the error? (I'm currently using trainImplicit in ALS, spark 1.6.2)
>
> Thank you.
>
>
>
> 2016-09-14 16:49 GMT-03:00 Sean Owen <so...@cloudera.com>:
>
>> There is no way to answer this without knowing what your inputs are
>> like. If they're on the scale of thousands, that's small (good). If
>> they're on the scale of 1-5, that's extremely poor.
>>
>> What's RMS vs RMSE?
>>
>> On Wed, Sep 14, 2016 at 8:33 PM, Pasquinell Urbani
>> <pa...@exalitica.com> wrote:
>> > Hi Community
>> >
>> > I'm performing an ALS for retail product recommendation. Right now I'm
>> > reaching rms_test = 2.3 and rmse_test = 32.5. Is this too much in your
>> > experience? Does the transformation of the ranking values important for
>> > having good errors?
>> >
>> > Thank you all.
>> >
>> > Pasquinell Urbani
>>
>
>

Re: RMSE in ALS

Posted by Pasquinell Urbani <pa...@exalitica.com>.

It was a typo mistake, both are rmse.

The frecency distribution of rankings is the following

[image: Imágenes integradas 2]

As you can see, I have heavy tail, but the majority of the observations
rely near ranking  5.

I'm working with implicit rankings (generated by TF-IDF), can this affect
the error? (I'm currently using trainImplicit in ALS, spark 1.6.2)

Thank you.



2016-09-14 16:49 GMT-03:00 Sean Owen <so...@cloudera.com>:

> There is no way to answer this without knowing what your inputs are
> like. If they're on the scale of thousands, that's small (good). If
> they're on the scale of 1-5, that's extremely poor.
>
> What's RMS vs RMSE?
>
> On Wed, Sep 14, 2016 at 8:33 PM, Pasquinell Urbani
> <pa...@exalitica.com> wrote:
> > Hi Community
> >
> > I'm performing an ALS for retail product recommendation. Right now I'm
> > reaching rms_test = 2.3 and rmse_test = 32.5. Is this too much in your
> > experience? Does the transformation of the ranking values important for
> > having good errors?
> >
> > Thank you all.
> >
> > Pasquinell Urbani
>

Re: RMSE in ALS

Posted by Sean Owen <so...@cloudera.com>.

There is no way to answer this without knowing what your inputs are
like. If they're on the scale of thousands, that's small (good). If
they're on the scale of 1-5, that's extremely poor.

What's RMS vs RMSE?

On Wed, Sep 14, 2016 at 8:33 PM, Pasquinell Urbani
<pa...@exalitica.com> wrote:
> Hi Community
>
> I'm performing an ALS for retail product recommendation. Right now I'm
> reaching rms_test = 2.3 and rmse_test = 32.5. Is this too much in your
> experience? Does the transformation of the ranking values important for
> having good errors?
>
> Thank you all.
>
> Pasquinell Urbani

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org