You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Lance Norskog <go...@gmail.com> on 2010/10/06 06:37:28 UTC

Recommenders and DataModels

I'm working with a DataModel that estimates preferences for all items
from any user. This seems to not work well with the SlopeOne
recommender. Are there tips&tricks for making recommenders work well
with this class of model? That is, the sample datamodels all seem to
explicitly store items and only return those prefs My model cheerfully
generates 1000 preferences if there are 1000 items.

Thanks,

-- 
Lance Norskog
goksron@gmail.com

Re: Recommenders and DataModels

Posted by Sean Owen <sr...@gmail.com>.
Yeah that's a good question ! Most algorithms answer the question of
"what are the top N recommendations" by estimating unknown
preferences. If you already estimate all unknown preferences then
there would be nothing left to recommend.

On Wed, Oct 6, 2010 at 8:44 PM, Lance Norskog <go...@gmail.com> wrote:
> Since I have a synthetic predictor built-in to the DataModel, do I
> need a Recommender?

Re: Recommenders and DataModels

Posted by Lance Norskog <go...@gmail.com>.
Since I have a synthetic predictor built-in to the DataModel, do I
need a Recommender?

On Wed, Oct 6, 2010 at 5:20 AM, Sean Owen <sr...@gmail.com> wrote:
> Interesting question. So the preferences are synthetic in some cases -- you
> have a pref for ever user-item combination? (Then what do you recommend? but
> I can imagine some answers.)
>
>
> By "not work well" do you mean performance or accuracy?
>
>
> For performance, yes, having very dense input will really slow down the
> pre-computation step, which is more or less linear in the size of the input.
> The resulting diffs table is usually dense-ish, since an entry exists any
> time two items co-occur; in this case it would be completely filled. This
> would also slow down things at runtime.
>
> This is all a symptom of having such dense data. One answer would be to
> 'prune' noise from your data (or generate less synthetic data, if I guess
> that right).
>
> Another answer is to prune the diffs table. The least interesting entries
> are those with highest standard deviation. You could hack the code to trim
> based on that to get better runtime performance.
>
>
> If you mean accuracy, then one guess is that the big assumption that
> slope-one makes for the input isn't valid for your data. Slope-one assumes
> that the ratings for item X and item Y are linearly related: Y = mX + b.
> Rather than spend time regressing to determine m and b for each pair, which
> would be hugely expensive, it makes the reasonable assumption that m=1 in
> all cases. So the problem is vastly simpler: computing the best b = Y-X,
> which is just the average difference across all X / Y prefs.
>
> That's a good assumption for most "normal" scenarios. But to the extent it's
> systematically not true of your data, this will fall apart. Since I am
> guessing much data is synthetic, that's why I wonder if there is some
> systematic incompatibility with this assumption.
>
>
> On Wed, Oct 6, 2010 at 5:37 AM, Lance Norskog <go...@gmail.com> wrote:
>
>> I'm working with a DataModel that estimates preferences for all items
>> from any user. This seems to not work well with the SlopeOne
>> recommender. Are there tips&tricks for making recommenders work well
>> with this class of model? That is, the sample datamodels all seem to
>> explicitly store items and only return those prefs My model cheerfully
>> generates 1000 preferences if there are 1000 items.
>>
>> Thanks,
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Recommenders and DataModels

Posted by Sean Owen <sr...@gmail.com>.
Interesting question. So the preferences are synthetic in some cases -- you
have a pref for ever user-item combination? (Then what do you recommend? but
I can imagine some answers.)


By "not work well" do you mean performance or accuracy?


For performance, yes, having very dense input will really slow down the
pre-computation step, which is more or less linear in the size of the input.
The resulting diffs table is usually dense-ish, since an entry exists any
time two items co-occur; in this case it would be completely filled. This
would also slow down things at runtime.

This is all a symptom of having such dense data. One answer would be to
'prune' noise from your data (or generate less synthetic data, if I guess
that right).

Another answer is to prune the diffs table. The least interesting entries
are those with highest standard deviation. You could hack the code to trim
based on that to get better runtime performance.


If you mean accuracy, then one guess is that the big assumption that
slope-one makes for the input isn't valid for your data. Slope-one assumes
that the ratings for item X and item Y are linearly related: Y = mX + b.
Rather than spend time regressing to determine m and b for each pair, which
would be hugely expensive, it makes the reasonable assumption that m=1 in
all cases. So the problem is vastly simpler: computing the best b = Y-X,
which is just the average difference across all X / Y prefs.

That's a good assumption for most "normal" scenarios. But to the extent it's
systematically not true of your data, this will fall apart. Since I am
guessing much data is synthetic, that's why I wonder if there is some
systematic incompatibility with this assumption.


On Wed, Oct 6, 2010 at 5:37 AM, Lance Norskog <go...@gmail.com> wrote:

> I'm working with a DataModel that estimates preferences for all items
> from any user. This seems to not work well with the SlopeOne
> recommender. Are there tips&tricks for making recommenders work well
> with this class of model? That is, the sample datamodels all seem to
> explicitly store items and only return those prefs My model cheerfully
> generates 1000 preferences if there are 1000 items.
>
> Thanks,
>
> --
> Lance Norskog
> goksron@gmail.com
>