You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Hiroyuki Yamada <mo...@gmail.com> on 2016/02/25 07:33:43 UTC

Re: What is the point of alpha value in Collaborative Filtering in MLlib ?

Hi, I've been doing some POC for CF in MLlib.
In my environment,  ratings are all implicit so that I try to use it with
trainImplicit method (in python).

The trainImplicit method takes alpha as one of the arguments to specify a
confidence for the ratings as described in <
http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html>,
but the alpha value is global for all the ratings so I am not sure why we
need this.
(If it is per rating, it makes sense to me, though.)

What is the difference in setting different alpha values for exactly the
same data set ?

I would be very appreciated if someone give me a reasonable explanation for
this.

Best regards,
Hiro

Re: What is the point of alpha value in Collaborative Filtering in MLlib ?

Posted by Hiroyuki Yamada <mo...@gmail.com>.

Hello Sean,

Thank you very much for the quick response.
That helps me a lot to understand it better !

Best regards,
Hiro

On Thu, Feb 25, 2016 at 6:59 PM, Sean Owen <so...@cloudera.com> wrote:

> This isn't specific to Spark; it's from the original paper.
>
> alpha doesn't do a whole lot, and it is a global hyperparam. It
> controls the relative weight of observed versus unobserved
> user-product interactions in the factorization. Higher alpha means
> it's much more important to faithfully reproduce the interactions that
> *did* happen as a "1", than reproduce the interactions that *didn't*
> happen as a "0".
>
> I don't think there's a good rule of thumb about what value to pick;
> it can't be less than 0 (less than 1 doesn't make much sense either),
> and you might just try values between 1 and 100 to see what gives the
> best result.
>
> I think that generally sparser input needs higher alpha, and maybe
> someone tells me that really alpha should be a function of the
> sparsity, but I've never seen that done.
>
>
>
> On Thu, Feb 25, 2016 at 6:33 AM, Hiroyuki Yamada <mo...@gmail.com>
> wrote:
> > Hi, I've been doing some POC for CF in MLlib.
> > In my environment,  ratings are all implicit so that I try to use it with
> > trainImplicit method (in python).
> >
> > The trainImplicit method takes alpha as one of the arguments to specify a
> > confidence for the ratings as described in
> > <http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html
> >,
> > but the alpha value is global for all the ratings so I am not sure why we
> > need this.
> > (If it is per rating, it makes sense to me, though.)
> >
> > What is the difference in setting different alpha values for exactly the
> > same data set ?
> >
> > I would be very appreciated if someone give me a reasonable explanation
> for
> > this.
> >
> > Best regards,
> > Hiro
>

Re: What is the point of alpha value in Collaborative Filtering in MLlib ?

Posted by Sean Owen <so...@cloudera.com>.

This isn't specific to Spark; it's from the original paper.

alpha doesn't do a whole lot, and it is a global hyperparam. It
controls the relative weight of observed versus unobserved
user-product interactions in the factorization. Higher alpha means
it's much more important to faithfully reproduce the interactions that
*did* happen as a "1", than reproduce the interactions that *didn't*
happen as a "0".

I don't think there's a good rule of thumb about what value to pick;
it can't be less than 0 (less than 1 doesn't make much sense either),
and you might just try values between 1 and 100 to see what gives the
best result.

I think that generally sparser input needs higher alpha, and maybe
someone tells me that really alpha should be a function of the
sparsity, but I've never seen that done.

On Thu, Feb 25, 2016 at 6:33 AM, Hiroyuki Yamada <mo...@gmail.com> wrote:
> Hi, I've been doing some POC for CF in MLlib.
> In my environment,  ratings are all implicit so that I try to use it with
> trainImplicit method (in python).
>
> The trainImplicit method takes alpha as one of the arguments to specify a
> confidence for the ratings as described in
> <http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html>,
> but the alpha value is global for all the ratings so I am not sure why we
> need this.
> (If it is per rating, it makes sense to me, though.)
>
> What is the difference in setting different alpha values for exactly the
> same data set ?
>
> I would be very appreciated if someone give me a reasonable explanation for
> this.
>
> Best regards,
> Hiro

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org