You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Lance Norskog <go...@gmail.com> on 2010/10/21 04:56:23 UTC

Recommender system implementations

Since this is Recommender day, here is another kvetch:

The recommender implementations with algorithms all do this in
Recommender.estimatePreference():
 public float estimatePreference(long userID, long itemID) throws
TasteException {
    DataModel model = getDataModel();
    Float actualPref = model.getPreferenceValue(userID, itemID);
    if (actualPref != null) {
      return actualPref;
    }
    return doEstimatePreference(userID, itemID);
  }

Meaning: "if I told you something, just parrot it back to me."
Otherwise, make a guess.

I am doing head-to-head comparisons of the dataModel preferences v.s.
the Recommender. This code makes it impossible to directly compare
what the recommender thinks v.s. the actual preference. If I wanted to
know what I told it, I already know that. I want to know what the
recommender thinks.

If this design decision is something y'all have argued about and
settled on, never mind. If it is just something that seemed like a
good idea at the time, can we change the recommenders, and the
Recommender "contract", to always use their own algorithm?

-- 
Lance Norskog
goksron@gmail.com

Re: Recommender system implementations

Posted by Sean Owen <sr...@gmail.com>.

Yes I think it's a good idea for the reason Gabriel gave. It's the best
answer to give. I'm reluctant to change this behavior at this point, as this
part of the code is more mature-ish and in use than others.

In the use case you reference, evaluation, there's already support for doing
this kind of testing automatically. The eval process will hold out data for
you and such. This approach is more accurate. Yes, I could leave the data in
but not have it use that info in estimatePreference() directly -- but then
that info was still used indirectly in other places like similarity
computations. The test becomes (in a smaller way) compromised anyway.

On Thu, Oct 21, 2010 at 3:56 AM, Lance Norskog <go...@gmail.com> wrote:

> Since this is Recommender day, here is another kvetch:
>
> The recommender implementations with algorithms all do this in
> Recommender.estimatePreference():
>  public float estimatePreference(long userID, long itemID) throws
> TasteException {
>    DataModel model = getDataModel();
>    Float actualPref = model.getPreferenceValue(userID, itemID);
>    if (actualPref != null) {
>      return actualPref;
>    }
>    return doEstimatePreference(userID, itemID);
>  }
>
> Meaning: "if I told you something, just parrot it back to me."
> Otherwise, make a guess.
>
> I am doing head-to-head comparisons of the dataModel preferences v.s.
> the Recommender. This code makes it impossible to directly compare
> what the recommender thinks v.s. the actual preference. If I wanted to
> know what I told it, I already know that. I want to know what the
> recommender thinks.
>
> If this design decision is something y'all have argued about and
> settled on, never mind. If it is just something that seemed like a
> good idea at the time, can we change the recommenders, and the
> Recommender "contract", to always use their own algorithm?
>
> --
> Lance Norskog
> goksron@gmail.com
>

Re: Recommender system implementations

Posted by Gabriel Webster <ga...@htc.com>.

I wasn't part of the arguing about and settling on this, but I still 
think it's the right behavior.  Firstly, the point of a recommender is 
to maximize accuracy, so it makes sense to return the true rating if 
it's known.  Secondly, the only time when actualPref would be non-null 
is when you're testing on your training data; in any valid test, the 
data points would be unseen and thus actualPref would always be null. 
So I think that code is just there for real-world cases when you have to 
display predicted ratings for any available item, not just items from 
the test block, in which case you want to display the user's actual 
rating for items that the user has rated.

But more to the point, the only time there might be true ratings
On 10/21/10 10:56 AM, Lance Norskog wrote:
> Since this is Recommender day, here is another kvetch:
>
> The recommender implementations with algorithms all do this in
> Recommender.estimatePreference():
>   public float estimatePreference(long userID, long itemID) throws
> TasteException {
>      DataModel model = getDataModel();
>      Float actualPref = model.getPreferenceValue(userID, itemID);
>      if (actualPref != null) {
>        return actualPref;
>      }
>      return doEstimatePreference(userID, itemID);
>    }
>
> Meaning: "if I told you something, just parrot it back to me."
> Otherwise, make a guess.
>
> I am doing head-to-head comparisons of the dataModel preferences v.s.
> the Recommender. This code makes it impossible to directly compare
> what the recommender thinks v.s. the actual preference. If I wanted to
> know what I told it, I already know that. I want to know what the
> recommender thinks.
>
> If this design decision is something y'all have argued about and
> settled on, never mind. If it is just something that seemed like a
> good idea at the time, can we change the recommenders, and the
> Recommender "contract", to always use their own algorithm?
>

Re: Recommender system implementations

Posted by Ted Dunning <te...@gmail.com>.

Actually this isn't the gold standard at all.  Testing on your training data
will give you very misleading
results and many algorithms that do worse on the training data will actually
do much, much better on
new data.  That is the whole point of avoiding over-fitting.

Test on held-out data for both the original and the derived models just like
Sean suggested.  To do
anything else will be misleading at best.

On Thu, Oct 21, 2010 at 9:39 PM, Lance Norskog <go...@gmail.com> wrote:

> Now, obviously, the gold standard for recommendations is the data in
> the original model. So, I make recommendations from the original, and
> the derived, from the user/item prefs given in the original data. I
> don't really care about what the user gave as preferences: I want to
> know what the recommender algorithm itself thinks. But the
> recommenders just parrot back the data model instead of giving me
> their own opinion. Thus, the point of this whole thread. But how
> recommender algorithms work is a side issue. I'm trying to use them as
> an indirect measurement of something else.
>
> What is another way to test what I'm trying to test? What is another
> way to evaluate the quality of my derivation function?
>

Re: Recommender system implementations

Posted by Lance Norskog <go...@gmail.com>.

Here's the thing: my derivation creates complete recommendations, so
it is both a data model and a recommender itself.

Ok, so I should separate the training data (GroupLens) into training
and test sets. Then, I should transform the training set. I will run a
SlopeOne recommender from the original test set, and my recommender
the same. Then, I will compare the recommendations... somehow.

Sean, you are correct, the derivation gives ranking values in a
different space than the original data model. So, I'm comparing the
order of recommendations. I'm trying a normal-scores thing because
it's easy.

http://comp9.psych.cornell.edu/Darlington/normscor.htm

However, at the moment I'm testing SlopeOne against random data to to
get a baseline.

Lance

On Fri, Oct 22, 2010 at 12:58 PM, Federico Castanedo
<ca...@gmail.com> wrote:
> Hi Lance,
>
> IMHO I think the best way to compare how much information are you loosing from
> your derivative function is to perform a cross-validation scheme both
> in the original
> data set and on the derivative data set.
>
> But be sure to compare the same validation set of the two sets (the original and
> the derivative), I mean if you use and 80%-20% for training/validation
> with a 5 cross-validation
> scheme, be sure you are comparing the same sub-set of your two sets.
>
> Regards,
> Federico
>
> 2010/10/22 Sean Owen <sr...@gmail.com>:
>> Yah I still think held-out data is the best thing, if you want to use this
>> built-in evaluation mechanism. Hold out the same data from both models and
>> run the same test.
>>
>> There is another approach which doesn't necessarily require held-out data.
>> On the original, full model, just compute recommendations for any users you
>> like. Assume these are "correct". Then do the same for the derived model.
>>
>> It will return to you estimated preferences in both cases. You could use the
>> deltas as a measure of "error" (unless your derived model has quite a
>> different rating space).
>>
>> Or simply use the difference in rankings -- compute some metric that
>> penalizes having recommendations in different places in the ordering.
>>
>> I'll say I don't know which of these is most mathematically sound.
>> Interpreting the results may be hard. But, any of these should give a notion
>> of "better" and "worse".
>>
>>
>> Assuming the original model's recommendations are "correct" is a reasonably
>> big one. For example, the whole point of an SVD recommender is to modify the
>> model (reduce its dimension really) in order to be able to recommend items
>> that should be recommended, but weren't before due to model sparseness.
>> There, transforming the data in theory gives better results. That it's
>> different doesn't mean worse necessarily.
>>
>> But maybe that's not an issue for your use case, don't know.
>>
>>
>> On Fri, Oct 22, 2010 at 5:39 AM, Lance Norskog <go...@gmail.com> wrote:
>>
>>> Here is my use case: I have two data models.
>>> 1) the original data, for example GroupLens
>>> 2) the derivative. This is a second data model which is derived from
>>> the original. It is made with a one-way function from the master.
>>>
>>> I wish to measure how much information is lost in the derivation
>>> function. There is some entropy, so therefore the derived data model
>>> cannot supply recommendations as good as the original data. But how
>>> much worse?
>>>
>>> My naive method is to make recommendations using the master model, and
>>> the derived model, and compare them. If the recommendations from the
>>> derived model are, say, 90% as good as from the original data, then
>>> the derivation function is ok.
>>>
>>> Now, obviously, the gold standard for recommendations is the data in
>>> the original model. So, I make recommendations from the original, and
>>> the derived, from the user/item prefs given in the original data. I
>>> don't really care about what the user gave as preferences: I want to
>>> know what the recommender algorithm itself thinks. But the
>>> recommenders just parrot back the data model instead of giving me
>>> their own opinion. Thus, the point of this whole thread. But how
>>> recommender algorithms work is a side issue. I'm trying to use them as
>>> an indirect measurement of something else.
>>>
>>> What is another way to test what I'm trying to test? What is another
>>> way to evaluate the quality of my derivation function?
>>>
>>> On Wed, Oct 20, 2010 at 11:41 PM, Sebastian Schelter <ss...@apache.org>
>>> wrote:
>>> > Hi Lance,
>>> >
>>> > When evaluating a recommender you should split your dataset in a training
>>> > and test part. Only data from the training part should be included in
>>> your
>>> > DataModel and you only measure the accuracy of predicting  ratings that
>>> are
>>> > included in the test part (which is not  known by your recommender). If
>>> you
>>> > structure things this way, the current implementation should work fine
>>> for
>>> > you.
>>> >
>>> > --sebastian
>>> >
>>> > On 21.10.2010 04:56, Lance Norskog wrote:
>>> >>
>>> >> Since this is Recommender day, here is another kvetch:
>>> >>
>>> >> The recommender implementations with algorithms all do this in
>>> >> Recommender.estimatePreference():
>>> >>  public float estimatePreference(long userID, long itemID) throws
>>> >> TasteException {
>>> >>     DataModel model = getDataModel();
>>> >>     Float actualPref = model.getPreferenceValue(userID, itemID);
>>> >>     if (actualPref != null) {
>>> >>       return actualPref;
>>> >>     }
>>> >>     return doEstimatePreference(userID, itemID);
>>> >>   }
>>> >>
>>> >> Meaning: "if I told you something, just parrot it back to me."
>>> >> Otherwise, make a guess.
>>> >>
>>> >> I am doing head-to-head comparisons of the dataModel preferences v.s.
>>> >> the Recommender. This code makes it impossible to directly compare
>>> >> what the recommender thinks v.s. the actual preference. If I wanted to
>>> >> know what I told it, I already know that. I want to know what the
>>> >> recommender thinks.
>>> >>
>>> >> If this design decision is something y'all have argued about and
>>> >> settled on, never mind. If it is just something that seemed like a
>>> >> good idea at the time, can we change the recommenders, and the
>>> >> Recommender "contract", to always use their own algorithm?
>>> >>
>>> >>
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Lance Norskog
>>> goksron@gmail.com
>>>
>>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Recommender system implementations

Posted by Federico Castanedo <ca...@gmail.com>.

Hi Lance,

IMHO I think the best way to compare how much information are you loosing from
your derivative function is to perform a cross-validation scheme both
in the original
data set and on the derivative data set.

But be sure to compare the same validation set of the two sets (the original and
the derivative), I mean if you use and 80%-20% for training/validation
with a 5 cross-validation
scheme, be sure you are comparing the same sub-set of your two sets.

Regards,
Federico

2010/10/22 Sean Owen <sr...@gmail.com>:
> Yah I still think held-out data is the best thing, if you want to use this
> built-in evaluation mechanism. Hold out the same data from both models and
> run the same test.
>
> There is another approach which doesn't necessarily require held-out data.
> On the original, full model, just compute recommendations for any users you
> like. Assume these are "correct". Then do the same for the derived model.
>
> It will return to you estimated preferences in both cases. You could use the
> deltas as a measure of "error" (unless your derived model has quite a
> different rating space).
>
> Or simply use the difference in rankings -- compute some metric that
> penalizes having recommendations in different places in the ordering.
>
> I'll say I don't know which of these is most mathematically sound.
> Interpreting the results may be hard. But, any of these should give a notion
> of "better" and "worse".
>
>
> Assuming the original model's recommendations are "correct" is a reasonably
> big one. For example, the whole point of an SVD recommender is to modify the
> model (reduce its dimension really) in order to be able to recommend items
> that should be recommended, but weren't before due to model sparseness.
> There, transforming the data in theory gives better results. That it's
> different doesn't mean worse necessarily.
>
> But maybe that's not an issue for your use case, don't know.
>
>
> On Fri, Oct 22, 2010 at 5:39 AM, Lance Norskog <go...@gmail.com> wrote:
>
>> Here is my use case: I have two data models.
>> 1) the original data, for example GroupLens
>> 2) the derivative. This is a second data model which is derived from
>> the original. It is made with a one-way function from the master.
>>
>> I wish to measure how much information is lost in the derivation
>> function. There is some entropy, so therefore the derived data model
>> cannot supply recommendations as good as the original data. But how
>> much worse?
>>
>> My naive method is to make recommendations using the master model, and
>> the derived model, and compare them. If the recommendations from the
>> derived model are, say, 90% as good as from the original data, then
>> the derivation function is ok.
>>
>> Now, obviously, the gold standard for recommendations is the data in
>> the original model. So, I make recommendations from the original, and
>> the derived, from the user/item prefs given in the original data. I
>> don't really care about what the user gave as preferences: I want to
>> know what the recommender algorithm itself thinks. But the
>> recommenders just parrot back the data model instead of giving me
>> their own opinion. Thus, the point of this whole thread. But how
>> recommender algorithms work is a side issue. I'm trying to use them as
>> an indirect measurement of something else.
>>
>> What is another way to test what I'm trying to test? What is another
>> way to evaluate the quality of my derivation function?
>>
>> On Wed, Oct 20, 2010 at 11:41 PM, Sebastian Schelter <ss...@apache.org>
>> wrote:
>> > Hi Lance,
>> >
>> > When evaluating a recommender you should split your dataset in a training
>> > and test part. Only data from the training part should be included in
>> your
>> > DataModel and you only measure the accuracy of predicting  ratings that
>> are
>> > included in the test part (which is not  known by your recommender). If
>> you
>> > structure things this way, the current implementation should work fine
>> for
>> > you.
>> >
>> > --sebastian
>> >
>> > On 21.10.2010 04:56, Lance Norskog wrote:
>> >>
>> >> Since this is Recommender day, here is another kvetch:
>> >>
>> >> The recommender implementations with algorithms all do this in
>> >> Recommender.estimatePreference():
>> >>  public float estimatePreference(long userID, long itemID) throws
>> >> TasteException {
>> >>     DataModel model = getDataModel();
>> >>     Float actualPref = model.getPreferenceValue(userID, itemID);
>> >>     if (actualPref != null) {
>> >>       return actualPref;
>> >>     }
>> >>     return doEstimatePreference(userID, itemID);
>> >>   }
>> >>
>> >> Meaning: "if I told you something, just parrot it back to me."
>> >> Otherwise, make a guess.
>> >>
>> >> I am doing head-to-head comparisons of the dataModel preferences v.s.
>> >> the Recommender. This code makes it impossible to directly compare
>> >> what the recommender thinks v.s. the actual preference. If I wanted to
>> >> know what I told it, I already know that. I want to know what the
>> >> recommender thinks.
>> >>
>> >> If this design decision is something y'all have argued about and
>> >> settled on, never mind. If it is just something that seemed like a
>> >> good idea at the time, can we change the recommenders, and the
>> >> Recommender "contract", to always use their own algorithm?
>> >>
>> >>
>> >
>> >
>>
>>
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>>
>

Re: Recommender system implementations

Posted by Sean Owen <sr...@gmail.com>.

Yah I still think held-out data is the best thing, if you want to use this
built-in evaluation mechanism. Hold out the same data from both models and
run the same test.

There is another approach which doesn't necessarily require held-out data.
On the original, full model, just compute recommendations for any users you
like. Assume these are "correct". Then do the same for the derived model.

It will return to you estimated preferences in both cases. You could use the
deltas as a measure of "error" (unless your derived model has quite a
different rating space).

Or simply use the difference in rankings -- compute some metric that
penalizes having recommendations in different places in the ordering.

I'll say I don't know which of these is most mathematically sound.
Interpreting the results may be hard. But, any of these should give a notion
of "better" and "worse".


Assuming the original model's recommendations are "correct" is a reasonably
big one. For example, the whole point of an SVD recommender is to modify the
model (reduce its dimension really) in order to be able to recommend items
that should be recommended, but weren't before due to model sparseness.
There, transforming the data in theory gives better results. That it's
different doesn't mean worse necessarily.

But maybe that's not an issue for your use case, don't know.


On Fri, Oct 22, 2010 at 5:39 AM, Lance Norskog <go...@gmail.com> wrote:

> Here is my use case: I have two data models.
> 1) the original data, for example GroupLens
> 2) the derivative. This is a second data model which is derived from
> the original. It is made with a one-way function from the master.
>
> I wish to measure how much information is lost in the derivation
> function. There is some entropy, so therefore the derived data model
> cannot supply recommendations as good as the original data. But how
> much worse?
>
> My naive method is to make recommendations using the master model, and
> the derived model, and compare them. If the recommendations from the
> derived model are, say, 90% as good as from the original data, then
> the derivation function is ok.
>
> Now, obviously, the gold standard for recommendations is the data in
> the original model. So, I make recommendations from the original, and
> the derived, from the user/item prefs given in the original data. I
> don't really care about what the user gave as preferences: I want to
> know what the recommender algorithm itself thinks. But the
> recommenders just parrot back the data model instead of giving me
> their own opinion. Thus, the point of this whole thread. But how
> recommender algorithms work is a side issue. I'm trying to use them as
> an indirect measurement of something else.
>
> What is another way to test what I'm trying to test? What is another
> way to evaluate the quality of my derivation function?
>
> On Wed, Oct 20, 2010 at 11:41 PM, Sebastian Schelter <ss...@apache.org>
> wrote:
> > Hi Lance,
> >
> > When evaluating a recommender you should split your dataset in a training
> > and test part. Only data from the training part should be included in
> your
> > DataModel and you only measure the accuracy of predicting  ratings that
> are
> > included in the test part (which is not  known by your recommender). If
> you
> > structure things this way, the current implementation should work fine
> for
> > you.
> >
> > --sebastian
> >
> > On 21.10.2010 04:56, Lance Norskog wrote:
> >>
> >> Since this is Recommender day, here is another kvetch:
> >>
> >> The recommender implementations with algorithms all do this in
> >> Recommender.estimatePreference():
> >>  public float estimatePreference(long userID, long itemID) throws
> >> TasteException {
> >>     DataModel model = getDataModel();
> >>     Float actualPref = model.getPreferenceValue(userID, itemID);
> >>     if (actualPref != null) {
> >>       return actualPref;
> >>     }
> >>     return doEstimatePreference(userID, itemID);
> >>   }
> >>
> >> Meaning: "if I told you something, just parrot it back to me."
> >> Otherwise, make a guess.
> >>
> >> I am doing head-to-head comparisons of the dataModel preferences v.s.
> >> the Recommender. This code makes it impossible to directly compare
> >> what the recommender thinks v.s. the actual preference. If I wanted to
> >> know what I told it, I already know that. I want to know what the
> >> recommender thinks.
> >>
> >> If this design decision is something y'all have argued about and
> >> settled on, never mind. If it is just something that seemed like a
> >> good idea at the time, can we change the recommenders, and the
> >> Recommender "contract", to always use their own algorithm?
> >>
> >>
> >
> >
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>

Re: Recommender system implementations

Posted by Lance Norskog <go...@gmail.com>.

Here is my use case: I have two data models.
1) the original data, for example GroupLens
2) the derivative. This is a second data model which is derived from
the original. It is made with a one-way function from the master.

I wish to measure how much information is lost in the derivation
function. There is some entropy, so therefore the derived data model
cannot supply recommendations as good as the original data. But how
much worse?

My naive method is to make recommendations using the master model, and
the derived model, and compare them. If the recommendations from the
derived model are, say, 90% as good as from the original data, then
the derivation function is ok.

Now, obviously, the gold standard for recommendations is the data in
the original model. So, I make recommendations from the original, and
the derived, from the user/item prefs given in the original data. I
don't really care about what the user gave as preferences: I want to
know what the recommender algorithm itself thinks. But the
recommenders just parrot back the data model instead of giving me
their own opinion. Thus, the point of this whole thread. But how
recommender algorithms work is a side issue. I'm trying to use them as
an indirect measurement of something else.

What is another way to test what I'm trying to test? What is another
way to evaluate the quality of my derivation function?

On Wed, Oct 20, 2010 at 11:41 PM, Sebastian Schelter <ss...@apache.org> wrote:
> Hi Lance,
>
> When evaluating a recommender you should split your dataset in a training
> and test part. Only data from the training part should be included in your
> DataModel and you only measure the accuracy of predicting  ratings that are
> included in the test part (which is not  known by your recommender). If you
> structure things this way, the current implementation should work fine for
> you.
>
> --sebastian
>
> On 21.10.2010 04:56, Lance Norskog wrote:
>>
>> Since this is Recommender day, here is another kvetch:
>>
>> The recommender implementations with algorithms all do this in
>> Recommender.estimatePreference():
>>  public float estimatePreference(long userID, long itemID) throws
>> TasteException {
>>     DataModel model = getDataModel();
>>     Float actualPref = model.getPreferenceValue(userID, itemID);
>>     if (actualPref != null) {
>>       return actualPref;
>>     }
>>     return doEstimatePreference(userID, itemID);
>>   }
>>
>> Meaning: "if I told you something, just parrot it back to me."
>> Otherwise, make a guess.
>>
>> I am doing head-to-head comparisons of the dataModel preferences v.s.
>> the Recommender. This code makes it impossible to directly compare
>> what the recommender thinks v.s. the actual preference. If I wanted to
>> know what I told it, I already know that. I want to know what the
>> recommender thinks.
>>
>> If this design decision is something y'all have argued about and
>> settled on, never mind. If it is just something that seemed like a
>> good idea at the time, can we change the recommenders, and the
>> Recommender "contract", to always use their own algorithm?
>>
>>
>
>

-- 
Lance Norskog
goksron@gmail.com

Re: Recommender system implementations

Posted by Sebastian Schelter <ss...@apache.org>.

Hi Lance,

When evaluating a recommender you should split your dataset in a 
training and test part. Only data from the training part should be 
included in your DataModel and you only measure the accuracy of 
predicting  ratings that are included in the test part (which is not  
known by your recommender). If you structure things this way, the 
current implementation should work fine for you.

--sebastian

On 21.10.2010 04:56, Lance Norskog wrote:
> Since this is Recommender day, here is another kvetch:
>
> The recommender implementations with algorithms all do this in
> Recommender.estimatePreference():
>   public float estimatePreference(long userID, long itemID) throws
> TasteException {
>      DataModel model = getDataModel();
>      Float actualPref = model.getPreferenceValue(userID, itemID);
>      if (actualPref != null) {
>        return actualPref;
>      }
>      return doEstimatePreference(userID, itemID);
>    }
>
> Meaning: "if I told you something, just parrot it back to me."
> Otherwise, make a guess.
>
> I am doing head-to-head comparisons of the dataModel preferences v.s.
> the Recommender. This code makes it impossible to directly compare
> what the recommender thinks v.s. the actual preference. If I wanted to
> know what I told it, I already know that. I want to know what the
> recommender thinks.
>
> If this design decision is something y'all have argued about and
> settled on, never mind. If it is just something that seemed like a
> good idea at the time, can we change the recommenders, and the
> Recommender "contract", to always use their own algorithm?
>
>