You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by jamborta <ja...@gmail.com> on 2010/02/19 23:51:06 UTC

item-based recommendation neighbourhood size

hi,

just wondering why there is no option to set the neighbourhood size for
item-based recommendation. I had a look at the implementation and it looks
like you take into account all items. is there a reason for that?

thanks
Tamas

-- 
View this message in context: http://old.nabble.com/item-based-recommendation-neighbourhood-size-tp27661482p27661482.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: item-based recommendation neighbourhood size

Posted by Gökhan Çapan <gk...@gmail.com>.

Sorry, "k most similar users" should be "k most similar items" in the 1st
step.

On Sun, Feb 21, 2010 at 1:12 PM, Gökhan Çapan <gk...@gmail.com> wrote:

> To reduce the time to recommend while producing online recommendations,
> here are steps I do for a very large dataset:
>
>    1.  I compute item-item similarities (for all item pairs who have been
>    rated by at least one common user), and after some optimizations (like
>    content boosting) I store k most similar users with degree of similarity for
>    each item.
>    2. While recommendation time, the system takes a user history vector,
>    which does not need to be one of the users in the dataset, as an input.
>    3. The algorithm looks all items in the input vector, fetches most
>    similar items from 1. If one of the most similar items of an item in the
>    user history are not rated by the user, it is added to recommendation list.
>    4. The list is sorted and top n elements are recommended.
>
> Computing rating for a specific item is also computed with a similar way.
> Also, if an item belongs to most similar items of more than one item in the
> user history, the possibility to recommend this item is higher.
>
> If you mean a system like this, I should say implementation is mostly done
> via Mahout. 1st step is computed by using mostSimilarItems function. Other
> steps are not from Mahout, but they are easy to implement.
>
>
> On Sat, Feb 20, 2010 at 9:46 PM, Ted Dunning <te...@gmail.com>wrote:
>
>> This is just one of an infinite number of variations on item-based
>> recommendation.  The general idea is that you do some kind of magic to
>> find
>> item-item connections, you trim those to make it all work and then you
>> recommend the items linked from the user's history of items they liked.
>>  If
>> the budget runs out (time, space or $), then you trim more.  All that the
>> grouplens guys are saying is that trimming didn't hurt accuracy so it is
>> probably good to do.
>>
>> The off-line connection finding can be done using LLR (for moderately high
>> traffic situations), SVD (for cases where transitive dependencies are
>> important), random indexing (poor man's SVD) or LDA (where small counts
>> make
>> SVD give crazy results).  There are many other possibilities as well.
>>
>> It would be great if you felt an itch to implement some of these and
>> decided
>> to scratch it and contribute the results back to Mahout.
>>
>> On Sat, Feb 20, 2010 at 6:46 AM, jamborta <ja...@gmail.com> wrote:
>>
>> >
>> > the basic concept of neighbourhood for item-based recommendation comes
>> from
>> > this paper:
>> >
>> > http://portal.acm.org/citation.cfm?id=371920.372071
>> >
>> > this is the idea:
>> >
>> > "The fact that we only need a small fraction of similar items to compute
>> > predictions leads us to an alternate model-based scheme. In this scheme,
>> we
>> > retain only a small number of similar items. For each item j we compute
>> the
>> > k most similar items. We term k as the model size. Based on this model
>> > building step, our prediction generation algorithm works as follows. For
>> > generating predictions for a user u on item i, our algorithm  first
>> > retrieves the precomputed k most similar items corresponding to the
>> target
>> > item i. Then it looks how many of those k items were purchased by the
>> user
>> > u, based on this intersection then the prediction is computed using
>> basic
>> > item-based collaborative filtering algorithm."
>> >
>> > --
>> > View this message in context:
>> >
>> http://old.nabble.com/item-based-recommendation-neighbourhood-size-tp27661482p27666954.html
>> > Sent from the Mahout User List mailing list archive at Nabble.com.
>> >
>> >
>>
>>
>> --
>> Ted Dunning, CTO
>> DeepDyve
>>
>
>
>
> --
> Gökhan Çapan
>



-- 
Gökhan Çapan

Re: item-based recommendation neighbourhood size

Posted by Gökhan Çapan <gk...@gmail.com>.

To reduce the time to recommend while producing online recommendations, here
are steps I do for a very large dataset:

   1.  I compute item-item similarities (for all item pairs who have been
   rated by at least one common user), and after some optimizations (like
   content boosting) I store k most similar users with degree of similarity for
   each item.
   2. While recommendation time, the system takes a user history vector,
   which does not need to be one of the users in the dataset, as an input.
   3. The algorithm looks all items in the input vector, fetches most
   similar items from 1. If one of the most similar items of an item in the
   user history are not rated by the user, it is added to recommendation list.
   4. The list is sorted and top n elements are recommended.

Computing rating for a specific item is also computed with a similar way.
Also, if an item belongs to most similar items of more than one item in the
user history, the possibility to recommend this item is higher.

If you mean a system like this, I should say implementation is mostly done
via Mahout. 1st step is computed by using mostSimilarItems function. Other
steps are not from Mahout, but they are easy to implement.

On Sat, Feb 20, 2010 at 9:46 PM, Ted Dunning <te...@gmail.com> wrote:

> This is just one of an infinite number of variations on item-based
> recommendation.  The general idea is that you do some kind of magic to find
> item-item connections, you trim those to make it all work and then you
> recommend the items linked from the user's history of items they liked.  If
> the budget runs out (time, space or $), then you trim more.  All that the
> grouplens guys are saying is that trimming didn't hurt accuracy so it is
> probably good to do.
>
> The off-line connection finding can be done using LLR (for moderately high
> traffic situations), SVD (for cases where transitive dependencies are
> important), random indexing (poor man's SVD) or LDA (where small counts
> make
> SVD give crazy results).  There are many other possibilities as well.
>
> It would be great if you felt an itch to implement some of these and
> decided
> to scratch it and contribute the results back to Mahout.
>
> On Sat, Feb 20, 2010 at 6:46 AM, jamborta <ja...@gmail.com> wrote:
>
> >
> > the basic concept of neighbourhood for item-based recommendation comes
> from
> > this paper:
> >
> > http://portal.acm.org/citation.cfm?id=371920.372071
> >
> > this is the idea:
> >
> > "The fact that we only need a small fraction of similar items to compute
> > predictions leads us to an alternate model-based scheme. In this scheme,
> we
> > retain only a small number of similar items. For each item j we compute
> the
> > k most similar items. We term k as the model size. Based on this model
> > building step, our prediction generation algorithm works as follows. For
> > generating predictions for a user u on item i, our algorithm  first
> > retrieves the precomputed k most similar items corresponding to the
> target
> > item i. Then it looks how many of those k items were purchased by the
> user
> > u, based on this intersection then the prediction is computed using basic
> > item-based collaborative filtering algorithm."
> >
> > --
> > View this message in context:
> >
> http://old.nabble.com/item-based-recommendation-neighbourhood-size-tp27661482p27666954.html
> > Sent from the Mahout User List mailing list archive at Nabble.com.
> >
> >
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>

-- 
Gökhan Çapan

Re: item-based recommendation neighbourhood size

Posted by Ted Dunning <te...@gmail.com>.

This is just one of an infinite number of variations on item-based
recommendation.  The general idea is that you do some kind of magic to find
item-item connections, you trim those to make it all work and then you
recommend the items linked from the user's history of items they liked.  If
the budget runs out (time, space or $), then you trim more.  All that the
grouplens guys are saying is that trimming didn't hurt accuracy so it is
probably good to do.

The off-line connection finding can be done using LLR (for moderately high
traffic situations), SVD (for cases where transitive dependencies are
important), random indexing (poor man's SVD) or LDA (where small counts make
SVD give crazy results).  There are many other possibilities as well.

It would be great if you felt an itch to implement some of these and decided
to scratch it and contribute the results back to Mahout.

On Sat, Feb 20, 2010 at 6:46 AM, jamborta <ja...@gmail.com> wrote:

>
> the basic concept of neighbourhood for item-based recommendation comes from
> this paper:
>
> http://portal.acm.org/citation.cfm?id=371920.372071
>
> this is the idea:
>
> "The fact that we only need a small fraction of similar items to compute
> predictions leads us to an alternate model-based scheme. In this scheme, we
> retain only a small number of similar items. For each item j we compute the
> k most similar items. We term k as the model size. Based on this model
> building step, our prediction generation algorithm works as follows. For
> generating predictions for a user u on item i, our algorithm  first
> retrieves the precomputed k most similar items corresponding to the target
> item i. Then it looks how many of those k items were purchased by the user
> u, based on this intersection then the prediction is computed using basic
> item-based collaborative filtering algorithm."
>
> --
> View this message in context:
> http://old.nabble.com/item-based-recommendation-neighbourhood-size-tp27661482p27666954.html
> Sent from the Mahout User List mailing list archive at Nabble.com.
>
>

-- 
Ted Dunning, CTO
DeepDyve

Re: item-based recommendation neighbourhood size

Posted by Sean Owen <sr...@gmail.com>.

Yes OK I like the idea. If you can precompute these neighborhoods,
there is no problem of computing them at runtime, so performance isn't
such a big deal. It does add the necessary step or re-computing
neighborhoods sometimes.

I still imagine there is a sparseness problem: what if I haven't rated
anything in the item's neighborhood, but I have rated some items
outside its neighborhood? It's too bad I can't get any estimated
preference for that item, but, maybe it's not a very reliable estimate
or good recommendation anyway.

The problem lessens as the neighborhood grows but then you just
approach the current algorithm.

I could imagine there is some sweet spot where the performance
overhead of re-computing neighborhoods is balanced out by better
recommendations (?) and faster runtime processing. I don't know
whether it would make better recommendations but have no reason to
think the right setting wouldn't. I'm also not sure there is a
performance issue for item-based recommendation -- it is generally
quite fast since it is generally used when you have some precomputed
item similarities to begin with.

I wouldn't discourage you from hacking it into the code and seeing how
it goes. If you find it has value, by all means let's bother to inject
this idea into the implementation. It's just a generalization of
what's there now.

Sean

On Sat, Feb 20, 2010 at 2:46 PM, jamborta <ja...@gmail.com> wrote:
>
> the basic concept of neighbourhood for item-based recommendation comes from
> this paper:
>
> http://portal.acm.org/citation.cfm?id=371920.372071
>
> this is the idea:
>
> "The fact that we only need a small fraction of similar items to compute
> predictions leads us to an alternate model-based scheme. In this scheme, we
> retain only a small number of similar items. For each item j we compute the
> k most similar items. We term k as the model size. Based on this model
> building step, our prediction generation algorithm works as follows. For
> generating predictions for a user u on item i, our algorithm  first
> retrieves the precomputed k most similar items corresponding to the target
> item i. Then it looks how many of those k items were purchased by the user
> u, based on this intersection then the prediction is computed using basic
> item-based collaborative filtering algorithm."
>

Re: item-based recommendation neighbourhood size

Posted by jamborta <ja...@gmail.com>.

the basic concept of neighbourhood for item-based recommendation comes from
this paper:

http://portal.acm.org/citation.cfm?id=371920.372071

this is the idea:

"The fact that we only need a small fraction of similar items to compute
predictions leads us to an alternate model-based scheme. In this scheme, we
retain only a small number of similar items. For each item j we compute the
k most similar items. We term k as the model size. Based on this model
building step, our prediction generation algorithm works as follows. For
generating predictions for a user u on item i, our algorithm first
retrieves the precomputed k most similar items corresponding to the target
item i. Then it looks how many of those k items were purchased by the user
u, based on this intersection then the prediction is computed using basic
item-based collaborative filtering algorithm."

-- 
View this message in context: http://old.nabble.com/item-based-recommendation-neighbourhood-size-tp27661482p27666954.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: item-based recommendation neighbourhood size

Posted by Sean Owen <sr...@gmail.com>.

On Sat, Feb 20, 2010 at 1:05 PM, Claudio Martella
<cl...@tis.bz.it> wrote:
> Can't i recommend you the items around the items you like? why isn't it
> computing recommendations?

Yes. That is item-based recommendation. The method mostSimilarItems()
is not making recommendations though; recommend() is.

mostSimilarItems() indeed uses a notion of item neighborhood;
recommend() does not.

This is why I wasn't clear what the issue was, since the question
seemed to concern item neighborhoods and recommendation.

Re: item-based recommendation neighbourhood size

Posted by Claudio Martella <cl...@tis.bz.it>.

Sean Owen wrote:
> On Sat, Feb 20, 2010 at 12:46 PM, Claudio Martella
> <cl...@tis.bz.it> wrote:
>   
>> Can we rephrase jamborta's idea by saying that an item's neighboorhood
>> can be created by putting
>> "around" an item all those items that have been rated by the same users
>> with similar ratings?
>>     
>
> Yes, that's how I would define a neighborhood around anything: things
> that are similar / closest. "Similar" could indeed be based on user
> ratings (this is what PearsonCorrelationSimilarity does).
>
>
>   
>> neighboorhood for items. What you can do now, is take your user's items
>> and see what's near them.
>>
>> This is probably a rephrase of user-based recommendation, though.
>>     
>
> I wouldn't say you're describing user-based reocommendation. You've
> correctly described defining a neighborhood of similar items. This is
> how ItemBasedRecommender.mostSimilarItems() works, indeed.
>
> But that method is not computing recommendations.

Can't i recommend you the items around the items you like? why isn't it
computing recommendations?

>  And I thought the
> question was, why doesn't an item neighborhood enter into a
> recommender computation? It doesn't at the moment, and I mentioned
> above how it could, but why it might not help or be beneficial. But I
> haven't tried it.
>
> But yes maybe that remains the issue here... what's the question
> exactly about item neighborhoods? Their definition is clear; the use
> is not as much.
>   

I agree.

-- 
Claudio Martella
Digital Technologies
Unit Research & Development - Analyst

TIS innovation park
Via Siemens 19 | Siemensstr. 19
39100 Bolzano | 39100 Bozen
Tel. +39 0471 068 123
Fax  +39 0471 068 129
claudio.martella@tis.bz.it http://www.tis.bz.it

Short information regarding use of personal data. According to Section 13 of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we process your personal data in order to fulfil contractual and fiscal obligations and also to send you information regarding our services and events. Your personal data are processed with and without electronic means and by respecting data subjects' rights, fundamental freedoms and dignity, particularly with regard to confidentiality, personal identity and the right to personal data protection. At any time and without formalities you can write an e-mail to privacy@tis.bz.it in order to object the processing of your personal data for the purpose of sending advertising materials and also to exercise the right to access personal data and other rights referred to in Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete information on the web site www.tis.bz.it.

Re: item-based recommendation neighbourhood size

Posted by Sean Owen <sr...@gmail.com>.

On Sat, Feb 20, 2010 at 12:46 PM, Claudio Martella
<cl...@tis.bz.it> wrote:
> Can we rephrase jamborta's idea by saying that an item's neighboorhood
> can be created by putting
> "around" an item all those items that have been rated by the same users
> with similar ratings?

Yes, that's how I would define a neighborhood around anything: things
that are similar / closest. "Similar" could indeed be based on user
ratings (this is what PearsonCorrelationSimilarity does).

> neighboorhood for items. What you can do now, is take your user's items
> and see what's near them.
>
> This is probably a rephrase of user-based recommendation, though.

I wouldn't say you're describing user-based reocommendation. You've
correctly described defining a neighborhood of similar items. This is
how ItemBasedRecommender.mostSimilarItems() works, indeed.

But that method is not computing recommendations. And I thought the
question was, why doesn't an item neighborhood enter into a
recommender computation? It doesn't at the moment, and I mentioned
above how it could, but why it might not help or be beneficial. But I
haven't tried it.

But yes maybe that remains the issue here... what's the question
exactly about item neighborhoods? Their definition is clear; the use
is not as much.

Re: item-based recommendation neighbourhood size

Posted by Claudio Martella <cl...@tis.bz.it>.

I'm also getting only recently in the recommendation world, so I'll take
advantage of this thread.
Can we rephrase jamborta's idea by saying that an item's neighboorhood
can be created by putting
"around" an item all those items that have been rated by the same users
with similar ratings?
This assumptions should make them similar. Each item is a point in a
multi-dimensional space where
each dimension as a user and the "quantity" is the rating. In that space
you have the concept of
neighboorhood for items. What you can do now, is take your user's items
and see what's near them.

This is probably a rephrase of user-based recommendation, though.

Sean Owen wrote:
> You estimate a preference for each of those items, yes, in either
> user-based or item-based recommendation. In item-based recommendation,
> the estimate is a weighted average -- it's the user's preferences for
> various items, weighted by their similarity to the given item.
>
> In that case you don't need a neighborhood. The items of interest are
> the user's preferred items -- and you want to use all of them, not a
> subset.
>
> It's not quite symmetrical with user-based recommendation, which is
> based on user similarity. There, you need to constrain yourself to
> examine only a subset of all users, a neighborhood, or else it would
> be wildly inefficient.
>
> But in item-based recommendation you don't have this issue. *Given an
> item*, you already know the very small number of items it needs to be
> compared to -- the user's preferred items. That takes the place of a
> neighborhood in a sense.
>
> You could say, well, then the problem is elsewhere: how can
> considering all possible items for recommendation be efficient? if we
> use neighborhoods to get around that in user-based, why not
> item-based? In fact the algorithm doesn't actually look at every item
> -- it constructs a set of items that are at all connected to any item
> the user prefers, in order to rule out most items that can't possibly
> be recommended.
>
> In that sense a 'neighborhood' comes into play: the set of all items
> considered is really the union of all maximal neighborhoods around any
> item that the user prefers. That's a big neighborhood, and if this is
> what you mean, you are correct that you could reasonably add
> parameters to constrain that neighborhood.
>
> The reasons maybe you don't want to do that are:
>
> 1) Item similarity is often 'fast' in that it is sometimes precomputed
> based on outside information. So sorting through a lot of potential
> items doesn't hurt much.
>
> 2) It's not part of the canonical item-based algorithm, but that's not
> a great reason.
>
> 3) Computing this neighborhood gets expensive: it must be defined
> based on distance to all items in the set, not one. That is, being far
> from or near to one item doesn't mean anything by itself. It matters
> how close it is to the whole set. By the time you're computing that...
> might as well just use the canonical algorithm.
>
> On Sat, Feb 20, 2010 at 11:22 AM, jamborta <ja...@gmail.com> wrote:
>   
>> but as far as I understand your implementation you take user1 and then get
>> all the items
>> that the user hasn't rated (getAllOtherItems()) and generate recommendation
>> for each of these items. therefore, you have user1 item1, user1 item2, etc
>> as input. so the neighbourhood can be restricted for each of these items.
>>
>> Tamas
>>     
>
>   

-- 
Claudio Martella
Digital Technologies
Unit Research & Development - Analyst

TIS innovation park
Via Siemens 19 | Siemensstr. 19
39100 Bolzano | 39100 Bozen
Tel. +39 0471 068 123
Fax  +39 0471 068 129
claudio.martella@tis.bz.it http://www.tis.bz.it

Short information regarding use of personal data. According to Section 13 of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we process your personal data in order to fulfil contractual and fiscal obligations and also to send you information regarding our services and events. Your personal data are processed with and without electronic means and by respecting data subjects' rights, fundamental freedoms and dignity, particularly with regard to confidentiality, personal identity and the right to personal data protection. At any time and without formalities you can write an e-mail to privacy@tis.bz.it in order to object the processing of your personal data for the purpose of sending advertising materials and also to exercise the right to access personal data and other rights referred to in Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete information on the web site www.tis.bz.it.

Re: item-based recommendation neighbourhood size

Posted by jamborta <ja...@gmail.com>.

thanks a lot for the explanation. that makes sense. 



srowen wrote:
> 
> You estimate a preference for each of those items, yes, in either
> user-based or item-based recommendation. In item-based recommendation,
> the estimate is a weighted average -- it's the user's preferences for
> various items, weighted by their similarity to the given item.
> 
> In that case you don't need a neighborhood. The items of interest are
> the user's preferred items -- and you want to use all of them, not a
> subset.
> 
> It's not quite symmetrical with user-based recommendation, which is
> based on user similarity. There, you need to constrain yourself to
> examine only a subset of all users, a neighborhood, or else it would
> be wildly inefficient.
> 
> But in item-based recommendation you don't have this issue. *Given an
> item*, you already know the very small number of items it needs to be
> compared to -- the user's preferred items. That takes the place of a
> neighborhood in a sense.
> 
> You could say, well, then the problem is elsewhere: how can
> considering all possible items for recommendation be efficient? if we
> use neighborhoods to get around that in user-based, why not
> item-based? In fact the algorithm doesn't actually look at every item
> -- it constructs a set of items that are at all connected to any item
> the user prefers, in order to rule out most items that can't possibly
> be recommended.
> 
> In that sense a 'neighborhood' comes into play: the set of all items
> considered is really the union of all maximal neighborhoods around any
> item that the user prefers. That's a big neighborhood, and if this is
> what you mean, you are correct that you could reasonably add
> parameters to constrain that neighborhood.
> 
> The reasons maybe you don't want to do that are:
> 
> 1) Item similarity is often 'fast' in that it is sometimes precomputed
> based on outside information. So sorting through a lot of potential
> items doesn't hurt much.
> 
> 2) It's not part of the canonical item-based algorithm, but that's not
> a great reason.
> 
> 3) Computing this neighborhood gets expensive: it must be defined
> based on distance to all items in the set, not one. That is, being far
> from or near to one item doesn't mean anything by itself. It matters
> how close it is to the whole set. By the time you're computing that...
> might as well just use the canonical algorithm.
> 
> On Sat, Feb 20, 2010 at 11:22 AM, jamborta <ja...@gmail.com> wrote:
>>
>> but as far as I understand your implementation you take user1 and then
>> get
>> all the items
>> that the user hasn't rated (getAllOtherItems()) and generate
>> recommendation
>> for each of these items. therefore, you have user1 item1, user1 item2,
>> etc
>> as input. so the neighbourhood can be restricted for each of these items.
>>
>> Tamas
> 
> 

-- 
View this message in context: http://old.nabble.com/item-based-recommendation-neighbourhood-size-tp27661482p27666452.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: item-based recommendation neighbourhood size

Posted by Sean Owen <sr...@gmail.com>.

You estimate a preference for each of those items, yes, in either
user-based or item-based recommendation. In item-based recommendation,
the estimate is a weighted average -- it's the user's preferences for
various items, weighted by their similarity to the given item.

In that case you don't need a neighborhood. The items of interest are
the user's preferred items -- and you want to use all of them, not a
subset.

It's not quite symmetrical with user-based recommendation, which is
based on user similarity. There, you need to constrain yourself to
examine only a subset of all users, a neighborhood, or else it would
be wildly inefficient.

But in item-based recommendation you don't have this issue. *Given an
item*, you already know the very small number of items it needs to be
compared to -- the user's preferred items. That takes the place of a
neighborhood in a sense.

You could say, well, then the problem is elsewhere: how can
considering all possible items for recommendation be efficient? if we
use neighborhoods to get around that in user-based, why not
item-based? In fact the algorithm doesn't actually look at every item
-- it constructs a set of items that are at all connected to any item
the user prefers, in order to rule out most items that can't possibly
be recommended.

In that sense a 'neighborhood' comes into play: the set of all items
considered is really the union of all maximal neighborhoods around any
item that the user prefers. That's a big neighborhood, and if this is
what you mean, you are correct that you could reasonably add
parameters to constrain that neighborhood.

The reasons maybe you don't want to do that are:

1) Item similarity is often 'fast' in that it is sometimes precomputed
based on outside information. So sorting through a lot of potential
items doesn't hurt much.

2) It's not part of the canonical item-based algorithm, but that's not
a great reason.

3) Computing this neighborhood gets expensive: it must be defined
based on distance to all items in the set, not one. That is, being far
from or near to one item doesn't mean anything by itself. It matters
how close it is to the whole set. By the time you're computing that...
might as well just use the canonical algorithm.

On Sat, Feb 20, 2010 at 11:22 AM, jamborta <ja...@gmail.com> wrote:
>
> but as far as I understand your implementation you take user1 and then get
> all the items
> that the user hasn't rated (getAllOtherItems()) and generate recommendation
> for each of these items. therefore, you have user1 item1, user1 item2, etc
> as input. so the neighbourhood can be restricted for each of these items.
>
> Tamas

Re: item-based recommendation neighbourhood size

Posted by jamborta <ja...@gmail.com>.

but as far as I understand your implementation you take user1 and then get
all the items
that the user hasn't rated (getAllOtherItems()) and generate recommendation
for each of these items. therefore, you have user1 item1, user1 item2, etc
as input. so the neighbourhood can be restricted for each of these items.

Tamas
 

If you are making recommendations, then there is no item1 as input.
You're only given user1. This is true in user-based or item-based
recommendation.

You are right that if we just wanted to predict one rating, you would
have user1 and item1 as input. All of the existing recommender
implementations actually can do this through the estimatePreference()
method.

None work by computing a neighborhood of items, since that's not
suitable to make recommendations. However the item-based recommender
in the framework can tell you the items most similar to a given item.
You could use that on your own to perform the computation you are
thinking of.

You might run into the following issue: in a sparse data set, user1
might not have rated anything in the immediate neighborhood of item1.

Sean


-- 
View this message in context: http://old.nabble.com/item-based-recommendation-neighbourhood-size-tp27661482p27665425.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: item-based recommendation neighbourhood size

Posted by Sean Owen <sr...@gmail.com>.

If you are making recommendations, then there is no item1 as input.
You're only given user1. This is true in user-based or item-based
recommendation.

You are right that if we just wanted to predict one rating, you would
have user1 and item1 as input. All of the existing recommender
implementations actually can do this through the estimatePreference()
method.

None work by computing a neighborhood of items, since that's not
suitable to make recommendations. However the item-based recommender
in the framework can tell you the items most similar to a given item.
You could use that on your own to perform the computation you are
thinking of.

You might run into the following issue: in a sparse data set, user1
might not have rated anything in the immediate neighborhood of item1.

Sean

On Sat, Feb 20, 2010 at 12:36 AM, jamborta <ja...@gmail.com> wrote:
>
> if we want to make a prediction for user1,item1 than it's the neighbourhood
> of item1.
>
> As you mentioned earlier I'm turning the classic recommender problem on its
> side. maybe i don't understand the problem exactly, but most of the papers I
> read think along this line.

Re: item-based recommendation neighbourhood size

Posted by jamborta <ja...@gmail.com>.

if we want to make a prediction for user1,item1 than it's the neighbourhood
of item1. 

As you mentioned earlier I'm turning the classic recommender problem on its
side. maybe i don't understand the problem exactly, but most of the papers I
read think along this line.



srowen wrote:
> 
> No that's not my understanding of how the canonical item-based
> algorithm operates. I am still not sure what neighborhood you are
> thinking of -- neighborhood of what item?
> 
> On Fri, Feb 19, 2010 at 11:59 PM, jamborta <ja...@gmail.com> wrote:
>>
>> but generally speaking item-based recommendation is just
>> looking at the same problem from a different point of view. the centre of
>> the neighbourhood
>> is the target item.
>>
>> besides, restricting the neighbourhood size seems to improve performance.
> 
> 

-- 
View this message in context: http://old.nabble.com/item-based-recommendation-neighbourhood-size-tp27661482p27662327.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: item-based recommendation neighbourhood size

Posted by Sean Owen <sr...@gmail.com>.

No that's not my understanding of how the canonical item-based
algorithm operates. I am still not sure what neighborhood you are
thinking of -- neighborhood of what item?

On Fri, Feb 19, 2010 at 11:59 PM, jamborta <ja...@gmail.com> wrote:
>
> but generally speaking item-based recommendation is just
> looking at the same problem from a different point of view. the centre of
> the neighbourhood
> is the target item.
>
> besides, restricting the neighbourhood size seems to improve performance.

Re: item-based recommendation neighbourhood size

Posted by jamborta <ja...@gmail.com>.

but generally speaking item-based recommendation is just
looking at the same problem from a different point of view. the centre of
the neighbourhood
is the target item.

besides, restricting the neighbourhood size seems to improve performance.


srowen wrote:
> 
> Similar to *which* item?
> 
> It sounds like you are turning the classic recommender problem on its
> side. Given an *item*, find users who might like it. Yes, you can
> easily do that too, but by thinking of items as users and vice versa.
> 
> This is not what item-based recommenders do.
> 
> On Fri, Feb 19, 2010 at 11:32 PM, jamborta <ja...@gmail.com> wrote:
>>
>> what about constructing a neighborhood of
>> similar items and consider users who rated the target item?
>>
>> something like this paper:
>>
>> http://portal.acm.org/citation.cfm?id=371920.372071
>>
>>
>>
>> srowen wrote:
>>>
>>> It's an interesting question, and I think the quickest answer is --
>>> what item would be the center of the neighborhood? how would you
>>> incorporate a notion of neighborhood?
>>>
>>> For users, it's clear: you construct ahead of time a neighborhood of
>>> similar users and consider item's they've rated. For item-based
>>> recommenders, this idea doesn't exist.
>>>
>>> It's easy to imagine new algorithms involving a neighborhood of items.
>>> Maybe I look at each item a user knows about, construct a neighborhood
>>> around that item and do something with it. These aren't canonical
>>> algorithms, but you could try them.
>>>
>>> Sean
>>>
>>> On Fri, Feb 19, 2010 at 10:51 PM, jamborta <ja...@gmail.com> wrote:
>>>>
>>>> hi,
>>>>
>>>> just wondering why there is no option to set the neighbourhood size for
>>>> item-based recommendation. I had a look at the implementation and it
>>>> looks
>>>> like you take into account all items. is there a reason for that?
>>>>
>>>> thanks
>>>> Tamas
>>>>
>>>> --
>>>> View this message in context:
>>>> http://old.nabble.com/item-based-recommendation-neighbourhood-size-tp27661482p27661482.html
>>>> Sent from the Mahout User List mailing list archive at Nabble.com.
>>>>
>>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/item-based-recommendation-neighbourhood-size-tp27661482p27661852.html
>> Sent from the Mahout User List mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://old.nabble.com/item-based-recommendation-neighbourhood-size-tp27661482p27662057.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: item-based recommendation neighbourhood size

Posted by Sean Owen <sr...@gmail.com>.

Similar to *which* item?

It sounds like you are turning the classic recommender problem on its
side. Given an *item*, find users who might like it. Yes, you can
easily do that too, but by thinking of items as users and vice versa.

This is not what item-based recommenders do.

On Fri, Feb 19, 2010 at 11:32 PM, jamborta <ja...@gmail.com> wrote:
>
> what about constructing a neighborhood of
> similar items and consider users who rated the target item?
>
> something like this paper:
>
> http://portal.acm.org/citation.cfm?id=371920.372071
>
>
>
> srowen wrote:
>>
>> It's an interesting question, and I think the quickest answer is --
>> what item would be the center of the neighborhood? how would you
>> incorporate a notion of neighborhood?
>>
>> For users, it's clear: you construct ahead of time a neighborhood of
>> similar users and consider item's they've rated. For item-based
>> recommenders, this idea doesn't exist.
>>
>> It's easy to imagine new algorithms involving a neighborhood of items.
>> Maybe I look at each item a user knows about, construct a neighborhood
>> around that item and do something with it. These aren't canonical
>> algorithms, but you could try them.
>>
>> Sean
>>
>> On Fri, Feb 19, 2010 at 10:51 PM, jamborta <ja...@gmail.com> wrote:
>>>
>>> hi,
>>>
>>> just wondering why there is no option to set the neighbourhood size for
>>> item-based recommendation. I had a look at the implementation and it
>>> looks
>>> like you take into account all items. is there a reason for that?
>>>
>>> thanks
>>> Tamas
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/item-based-recommendation-neighbourhood-size-tp27661482p27661482.html
>>> Sent from the Mahout User List mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>
> --
> View this message in context: http://old.nabble.com/item-based-recommendation-neighbourhood-size-tp27661482p27661852.html
> Sent from the Mahout User List mailing list archive at Nabble.com.
>
>

Re: item-based recommendation neighbourhood size

Posted by jamborta <ja...@gmail.com>.

what about constructing a neighborhood of
similar items and consider users who rated the target item?

something like this paper:

http://portal.acm.org/citation.cfm?id=371920.372071



srowen wrote:
> 
> It's an interesting question, and I think the quickest answer is --
> what item would be the center of the neighborhood? how would you
> incorporate a notion of neighborhood?
> 
> For users, it's clear: you construct ahead of time a neighborhood of
> similar users and consider item's they've rated. For item-based
> recommenders, this idea doesn't exist.
> 
> It's easy to imagine new algorithms involving a neighborhood of items.
> Maybe I look at each item a user knows about, construct a neighborhood
> around that item and do something with it. These aren't canonical
> algorithms, but you could try them.
> 
> Sean
> 
> On Fri, Feb 19, 2010 at 10:51 PM, jamborta <ja...@gmail.com> wrote:
>>
>> hi,
>>
>> just wondering why there is no option to set the neighbourhood size for
>> item-based recommendation. I had a look at the implementation and it
>> looks
>> like you take into account all items. is there a reason for that?
>>
>> thanks
>> Tamas
>>
>> --
>> View this message in context:
>> http://old.nabble.com/item-based-recommendation-neighbourhood-size-tp27661482p27661482.html
>> Sent from the Mahout User List mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://old.nabble.com/item-based-recommendation-neighbourhood-size-tp27661482p27661852.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: item-based recommendation neighbourhood size

Posted by Sean Owen <sr...@gmail.com>.

It's an interesting question, and I think the quickest answer is --
what item would be the center of the neighborhood? how would you
incorporate a notion of neighborhood?

For users, it's clear: you construct ahead of time a neighborhood of
similar users and consider item's they've rated. For item-based
recommenders, this idea doesn't exist.

It's easy to imagine new algorithms involving a neighborhood of items.
Maybe I look at each item a user knows about, construct a neighborhood
around that item and do something with it. These aren't canonical
algorithms, but you could try them.

Sean

On Fri, Feb 19, 2010 at 10:51 PM, jamborta <ja...@gmail.com> wrote:
>
> hi,
>
> just wondering why there is no option to set the neighbourhood size for
> item-based recommendation. I had a look at the implementation and it looks
> like you take into account all items. is there a reason for that?
>
> thanks
> Tamas
>
> --
> View this message in context: http://old.nabble.com/item-based-recommendation-neighbourhood-size-tp27661482p27661482.html
> Sent from the Mahout User List mailing list archive at Nabble.com.
>
>