You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Debasish Das <de...@gmail.com> on 2014/11/04 00:20:30 UTC

Re: matrix factorization cross validation

I added the drivers for precisionAt(k: Int) driver for the movielens
test-cases...Although I am a bit confused on precisionAt(k: Int) code from
RankingMetrics.scala...

While cross validating, I am really not sure how to set K...

if (labSet.nonEmpty) { val n = math.min(pred.length, k) ... }

If I make k as a function of pred.length val n = math.min(pred.length,
k*pred.length) then I can vary k between 0 and 1 and choose the sweet spot
for K on a given dataset but I am not sure if it is a measure that makes
sense for recommendation...

MAP is something that makes sense as it is average over all test set...

On Fri, Oct 31, 2014 at 1:26 AM, Sean Owen <so...@cloudera.com> wrote:

> No, excepting approximate methods like LSH to figure out the
> relatively small set of candidates for the users in the partition, and
> broadcast or join those.
>
> On Fri, Oct 31, 2014 at 5:45 AM, Nick Pentreath
> <ni...@gmail.com> wrote:
> > Sean, re my point earlier do you know a more efficient way to compute
> top k
> > for each user, other than to broadcast the item factors?
> >
> > (I guess one can use the new asymmetric lsh paper perhaps to assist)
> >
> > —
> > Sent from Mailbox
> >
> >
> > On Thu, Oct 30, 2014 at 11:24 PM, Sean Owen <so...@cloudera.com> wrote:
> >>
> >> MAP is effectively an average over all k from 1 to min(#
> >> recommendations, # items rated) Getting first recommendations right is
> >> more important than the last.
> >>
> >> On Thu, Oct 30, 2014 at 10:21 PM, Debasish Das <
> debasish.das83@gmail.com>
> >> wrote:
> >> > Does it make sense to have a user specific K or K is considered same
> >> > over
> >> > all users ?
> >> >
> >> > Intuitively the users who watches more movies should get a higher K
> than
> >> > the
> >> > others...
> >> >
> >
> >
>