You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Sean Owen <sr...@gmail.com> on 2012/01/25 18:36:49 UTC

Re: Add on to itemsimilarity

(moving to user@)

I think I understand more about what you are doing. It doesn't quite make
sense to say you will train a recommender on the output of the recommender,
but I understand that you mean you have some information about what users
have visited what attractions or shows.

This is classic recommendation. You put that in, and it can tell you what
other attractions, shows, etc. the user may like.

So going back to the beginning, I'm not yet clear on why that isn't already
the answer for you, since you have built this. Explain again what else you
are trying to do to filter or process the result?

On Wed, Jan 25, 2012 at 5:25 PM, Saikat Kanjilal <sx...@hotmail.com>wrote:

>
> Putting back on the list, we want to recommend new items in the park, an
> item could be:1) attraction2) restaurant3) show4) Ride5) resort
> Our real data if you will is the recommendations that result from
> understanding their preferences in more detail based on their reservations
> and resort stays.  So I wonder if our real data is our training data that
> the recommender can use for training and calculate predicted data based on
> that.
>
> Date: Wed, 25 Jan 2012 17:20:02 +0000
> Subject: Re: Add on to itemsimilarity
> From: srowen@gmail.com
> To: sxk1969@hotmail.com
>
> (do you mind putting this back on the list? might be a good discussion for
> others)
> What are you recommending to the user -- theme parks, rides at a theme
> park?
> Yes, you would always be recommending 'unknown' things to the user. You
> already 'know' how much they like or dislike the things for which you have
> data, so recommendations aren't of use to you.
>
> Of course, you can use both real and predicted data in your system -- it
> depends on what you are trying to accomplish. The recommender's role is
> creating the predicted data.
>
>
> On Wed, Jan 25, 2012 at 5:12 PM, Saikat Kanjilal <sx...@hotmail.com>
> wrote:
>
>
>
>
>
> Actually let me more clear, we are building a recommendations engine for a
> theme parks experience,  the user preferences is something we are storing
> based on the user's reservations and analytics, this is something that's
> stored before the user rates any items and may or may not have a direct
> relationship to the recommendations the user makes as they go around the
> park.  This is due to the fact that the user recommendations could be other
> rides or attractions that exist outside of the actual preferences.  Its not
> clear yet to me how to tie these preferences into the item similarity
> results.
>
>

Re: Add on to itemsimilarity

Posted by Lance Norskog <go...@gmail.com>.

Another problem is activity changes over time. The Netflix viewing
data showed a striking change in viewing patterms (I think sometime in
2004). Suppose you test 6 months of training data v.s. the following
month of test data: Q3 and Q4 of 2003 v.s. January 2004. Now, do this
for every month in 2004-2005, rolling the training and test sets
forward month by month. You will see recommendation quality dip and
recover across this time, because the recent past activity stopped
predicting the future for several months.

On Mon, Jan 30, 2012 at 10:31 AM, Ted Dunning <te...@gmail.com> wrote:
> I don't know that I have any secrets.
>
> I have observed garbage performance from recommenders based on behavior.
>  That performance got enormously better as we chose different behaviors to
> indicate engagement.
>
> As an example of what we looked at, consider a video site which records
> ratings, video views and 30 second (or more) video views.
>
> Ratings information was minute compared to the other data and thus had
> little value.  Many videos never had any ratings and the vast majority of
> all users never rated anything.  Even worse, it was impossible to ever
> detect any improvement in performance when we added ratings information.
>  Performance with ratings alone was not discernibly better than random
> recommendations.
>
> Video views was the largest data source and after the problems with the
> paucity of ratings it looked better.  Unfortunately, our users often
> clicked in videos due to misleading meta-data or because they were vaguely
> curious.  Neither of those situations represented an expression of user
> preference.  In practice, recommender performance with video views was
> better than random, but still pretty poor.
>
> 30 second video views produced very good results in spite of the fact that
> the data was 10x smaller than raw video views.  This was demonstrated by
> heuristic examination (aka the "laugh test") and by click-through and by
> user session length.  Mixing in video views degraded performance visibly.
>
>
> In building these systems, it was critical to incorporate a system like the
> LogLikelihoodSimilarity for building the item-item model.  Direct user
> based recommenders that used cosine and similar user-user metrics were
> laughably bad and were dominated by popular items.
>
> In earlier work at Musicmatch, we had similar results in that we had to
> carefully select which interactions we used as input to the recommender.
>  The overall process was much simpler, however, since we came closer to
> good results in our first tries.
>
>
>
>
> On Mon, Jan 30, 2012 at 1:43 PM, Lee Carroll
> <le...@googlemail.com>wrote:
>
>> >So I find that mental state estimations are the indirect way to model and
>> >predict behaviors while directly modeling behaviors based on observed
>> >behaviors is, well, more direct.
>>
>> That's a lovely switch :-) you should come and work for our business
>> unit, they would love you :-)
>>
>> However the experience of using page behaviour to recommend product
>> has been really disappointing
>> never out performing simple heuristics (and i mean really simple
>> market segmentation). Maybe we should look again
>> but having fallen for the engagement metric stuff once what would we
>> need to look out for to make it better ?
>> What's your secret Ted!



-- 
Lance Norskog
goksron@gmail.com

Re: Add on to itemsimilarity

Posted by Ted Dunning <te...@gmail.com>.

I don't know that I have any secrets.

I have observed garbage performance from recommenders based on behavior.
 That performance got enormously better as we chose different behaviors to
indicate engagement.

As an example of what we looked at, consider a video site which records
ratings, video views and 30 second (or more) video views.

Ratings information was minute compared to the other data and thus had
little value.  Many videos never had any ratings and the vast majority of
all users never rated anything.  Even worse, it was impossible to ever
detect any improvement in performance when we added ratings information.
 Performance with ratings alone was not discernibly better than random
recommendations.

Video views was the largest data source and after the problems with the
paucity of ratings it looked better.  Unfortunately, our users often
clicked in videos due to misleading meta-data or because they were vaguely
curious.  Neither of those situations represented an expression of user
preference.  In practice, recommender performance with video views was
better than random, but still pretty poor.

30 second video views produced very good results in spite of the fact that
the data was 10x smaller than raw video views.  This was demonstrated by
heuristic examination (aka the "laugh test") and by click-through and by
user session length.  Mixing in video views degraded performance visibly.

In building these systems, it was critical to incorporate a system like the
LogLikelihoodSimilarity for building the item-item model.  Direct user
based recommenders that used cosine and similar user-user metrics were
laughably bad and were dominated by popular items.

In earlier work at Musicmatch, we had similar results in that we had to
carefully select which interactions we used as input to the recommender.
 The overall process was much simpler, however, since we came closer to
good results in our first tries.

On Mon, Jan 30, 2012 at 1:43 PM, Lee Carroll
<le...@googlemail.com>wrote:

> >So I find that mental state estimations are the indirect way to model and
> >predict behaviors while directly modeling behaviors based on observed
> >behaviors is, well, more direct.
>
> That's a lovely switch :-) you should come and work for our business
> unit, they would love you :-)
>
> However the experience of using page behaviour to recommend product
> has been really disappointing
> never out performing simple heuristics (and i mean really simple
> market segmentation). Maybe we should look again
> but having fallen for the engagement metric stuff once what would we
> need to look out for to make it better ?
> What's your secret Ted!

Re: Add on to itemsimilarity

Posted by Lee Carroll <le...@googlemail.com>.

Hi Ted,
>So I find that mental state estimations are the indirect way to model and
>predict behaviors while directly modeling behaviors based on observed
>behaviors is, well, more direct.

That's a lovely switch :-) you should come and work for our business
unit, they would love you :-)

However the experience of using page behaviour to recommend product
has been really disappointing
never out performing simple heuristics (and i mean really simple
market segmentation). Maybe we should look again
but having fallen for the engagement metric stuff once what would we
need to look out for to make it better ?
What's your secret Ted!

Lee C

Re: Add on to itemsimilarity

Posted by Lee Carroll <le...@googlemail.com>.

Hi Anatoliy

We have separated different forms of suggestions to users. Mainly to
better explain why the user is being presented with these products.

section of CF recommendations (with explain info)
section of previous search (including previous views)
section of content based recommendations (this is a straight
implementation of mlt out of the box and has the best click through
:-)

Now I'm not saying a combined list would not work but the danger is it
looses its focus and confuses or at least is not understandable by the
user.

Is the need for a combined list a UI problem?



On 30 January 2012 10:26, Anatoliy Kats <a....@rambler-co.ru> wrote:
> It stands to reason that if you click on an Amazon book description, you
> should be offered to buy it next time.  Amazon certainly does.  I see Sean's
> point, in a pure form a recommender should only recommend unknown items.
>  Certainly that's the behavior you need in a theoretical test framework.  I
> think where we differ is that some people here build systems where ratings
> are computed from user behavior, and therefore decoupled from the set of
> candidate items.  I understand this is not Mahout's original purpose, and
> that it's difficult to build this support into Mahout in a principled way.
>  But it would be helpful to some of us if Mahout had that capability.  I
> accomplished it by wrapping my own recommender class around Mahout's
> delegate, overriding estimatePreference(), and using a CandidateSimilarity
> that allows previously rated items.  Perhaps a more acceptable solution is
> adding a post-recommendation processing step that combines the
> recommendation result with a set of userChoosableRecommendedItems, and
> returns the top N items from that combined list.  This design intrudes a lot
> less into Mahout's internals.
>
> Would anyone else benefit from this addition?
>
>
> On 01/29/2012 12:33 AM, Ted Dunning wrote:
>>
>> Also, Lee, I think you have it backwards.  It is true that clicks are not
>> the same thing as preferences, but I don't think that I give a fig about
>> preferences since they are the internal mental state of the visitor.  I
>> know that some of the visitors are mental, but I don't really care since
>> that is their own business.  What I care about is encouraging certain
>> behaviors.
>>
>> So I find that mental state estimations are the indirect way to model and
>> predict behaviors while directly modeling behaviors based on observed
>> behaviors is, well, more direct.
>>
>> This is compounded by the fact that asking people to rate things invokes a
>> really complicated social interaction which does not directly sample the
>> internal mental state (i.e. the real preference) but instead measures
>> behavior that is a very complicated outcome of social pressures,
>> expectations and the internal mental state.  So using ratings boils down
>> to
>> using one kind of behavior to estimate mental state that then is
>> hypothesized to result in the desired second kind of behavior.
>>
>>
>>
>> On Sat, Jan 28, 2012 at 10:51 AM, Sean Owen<sr...@gmail.com>  wrote:
>>
>>> It means *something* that a user clicked on one item and not 10,000
>>> others.
>>> You will learn things like that Star Wars and Star Trek are somehow
>>> related
>>> from this data. I don't think that clicks are a bad input per se.
>>>
>>> I agree that it's not obvious how to meaningfully translate user actions
>>> into a linear scale. "1" per click and "10" for purchase or something is
>>> a
>>> guess. I do think you will learn something from the data this way.
>>>
>>> There is nothing conceptually wrong with mixing real data and estimated
>>> data. If the results aren't looking right, it is not a problem with the
>>> concept, but the mapping of action onto some rating scale. I think it's
>>> hard to get that right, but is not impossible to get it "good".
>>>
>>> On Sat, Jan 28, 2012 at 10:15 AM, Lee Carroll
>>> <le...@googlemail.com>wrote:
>>>
>>>>> I would argue, though, that .recommend() is aimed at the latter task:
>>>>
>>>> No . I think the mismatch here is you are using at best a wild guess
>>>> at a preference for the convenience of using a recommender and then in
>>>> the same breath expecting the recommender to "understand" that you are
>>>> not using preferences at all and actually have no idea what the user
>>>> preference is. You cant have it both ways :-)
>>>>
>>>> A click through on an item is not a measure of user preference for
>>>> that item. I know its not what you want to hear (or better what your
>>>> business users want to here) but there it is.
>>>>
>>>> We can pretend, or maybe even build a convincing narrative that a
>>>> click is some sort of item association and use that as a proxy
>>>> preference and we might even get some mileage out of it, but we should
>>>> not change the behaviour of the .recommend() to hide its short
>>>> comings.
>>>>
>

Re: Add on to itemsimilarity

Posted by Saikat Kanjilal <sx...@hotmail.com>.

Good explanation, this is kind of what I was trying to do by making assumptions from user preferences and returning the intersection of those with the item similarity list.  Yes I also think this would be a valuable addition.

Sent from my iPhone

On Jan 30, 2012, at 2:26 AM, Anatoliy Kats <a....@rambler-co.ru> wrote:

> It stands to reason that if you click on an Amazon book description, you should be offered to buy it next time.  Amazon certainly does.  I see Sean's point, in a pure form a recommender should only recommend unknown items.  Certainly that's the behavior you need in a theoretical test framework.  I think where we differ is that some people here build systems where ratings are computed from user behavior, and therefore decoupled from the set of candidate items.  I understand this is not Mahout's original purpose, and that it's difficult to build this support into Mahout in a principled way.  But it would be helpful to some of us if Mahout had that capability.  I accomplished it by wrapping my own recommender class around Mahout's delegate, overriding estimatePreference(), and using a CandidateSimilarity that allows previously rated items.  Perhaps a more acceptable solution is adding a post-recommendation processing step that combines the recommendation result with a set of userChoosableRecommendedItems, and returns the top N items from that combined list.  This design intrudes a lot less into Mahout's internals.
> 
> Would anyone else benefit from this addition?
> 
> On 01/29/2012 12:33 AM, Ted Dunning wrote:
>> Also, Lee, I think you have it backwards.  It is true that clicks are not
>> the same thing as preferences, but I don't think that I give a fig about
>> preferences since they are the internal mental state of the visitor.  I
>> know that some of the visitors are mental, but I don't really care since
>> that is their own business.  What I care about is encouraging certain
>> behaviors.
>> 
>> So I find that mental state estimations are the indirect way to model and
>> predict behaviors while directly modeling behaviors based on observed
>> behaviors is, well, more direct.
>> 
>> This is compounded by the fact that asking people to rate things invokes a
>> really complicated social interaction which does not directly sample the
>> internal mental state (i.e. the real preference) but instead measures
>> behavior that is a very complicated outcome of social pressures,
>> expectations and the internal mental state.  So using ratings boils down to
>> using one kind of behavior to estimate mental state that then is
>> hypothesized to result in the desired second kind of behavior.
>> 
>> 
>> 
>> On Sat, Jan 28, 2012 at 10:51 AM, Sean Owen<sr...@gmail.com>  wrote:
>> 
>>> It means *something* that a user clicked on one item and not 10,000 others.
>>> You will learn things like that Star Wars and Star Trek are somehow related
>>> from this data. I don't think that clicks are a bad input per se.
>>> 
>>> I agree that it's not obvious how to meaningfully translate user actions
>>> into a linear scale. "1" per click and "10" for purchase or something is a
>>> guess. I do think you will learn something from the data this way.
>>> 
>>> There is nothing conceptually wrong with mixing real data and estimated
>>> data. If the results aren't looking right, it is not a problem with the
>>> concept, but the mapping of action onto some rating scale. I think it's
>>> hard to get that right, but is not impossible to get it "good".
>>> 
>>> On Sat, Jan 28, 2012 at 10:15 AM, Lee Carroll
>>> <le...@googlemail.com>wrote:
>>> 
>>>>> I would argue, though, that .recommend() is aimed at the latter task:
>>>> No . I think the mismatch here is you are using at best a wild guess
>>>> at a preference for the convenience of using a recommender and then in
>>>> the same breath expecting the recommender to "understand" that you are
>>>> not using preferences at all and actually have no idea what the user
>>>> preference is. You cant have it both ways :-)
>>>> 
>>>> A click through on an item is not a measure of user preference for
>>>> that item. I know its not what you want to hear (or better what your
>>>> business users want to here) but there it is.
>>>> 
>>>> We can pretend, or maybe even build a convincing narrative that a
>>>> click is some sort of item association and use that as a proxy
>>>> preference and we might even get some mileage out of it, but we should
>>>> not change the behaviour of the .recommend() to hide its short
>>>> comings.
>>>> 
> 
>

Re: Add on to itemsimilarity

Posted by Anatoliy Kats <a....@rambler-co.ru>.

It stands to reason that if you click on an Amazon book description, you 
should be offered to buy it next time.  Amazon certainly does.  I see 
Sean's point, in a pure form a recommender should only recommend unknown 
items.  Certainly that's the behavior you need in a theoretical test 
framework.  I think where we differ is that some people here build 
systems where ratings are computed from user behavior, and therefore 
decoupled from the set of candidate items.  I understand this is not 
Mahout's original purpose, and that it's difficult to build this support 
into Mahout in a principled way.  But it would be helpful to some of us 
if Mahout had that capability.  I accomplished it by wrapping my own 
recommender class around Mahout's delegate, overriding 
estimatePreference(), and using a CandidateSimilarity that allows 
previously rated items.  Perhaps a more acceptable solution is adding a 
post-recommendation processing step that combines the recommendation 
result with a set of userChoosableRecommendedItems, and returns the top 
N items from that combined list.  This design intrudes a lot less into 
Mahout's internals.

Would anyone else benefit from this addition?

On 01/29/2012 12:33 AM, Ted Dunning wrote:
> Also, Lee, I think you have it backwards.  It is true that clicks are not
> the same thing as preferences, but I don't think that I give a fig about
> preferences since they are the internal mental state of the visitor.  I
> know that some of the visitors are mental, but I don't really care since
> that is their own business.  What I care about is encouraging certain
> behaviors.
>
> So I find that mental state estimations are the indirect way to model and
> predict behaviors while directly modeling behaviors based on observed
> behaviors is, well, more direct.
>
> This is compounded by the fact that asking people to rate things invokes a
> really complicated social interaction which does not directly sample the
> internal mental state (i.e. the real preference) but instead measures
> behavior that is a very complicated outcome of social pressures,
> expectations and the internal mental state.  So using ratings boils down to
> using one kind of behavior to estimate mental state that then is
> hypothesized to result in the desired second kind of behavior.
>
>
>
> On Sat, Jan 28, 2012 at 10:51 AM, Sean Owen<sr...@gmail.com>  wrote:
>
>> It means *something* that a user clicked on one item and not 10,000 others.
>> You will learn things like that Star Wars and Star Trek are somehow related
>> from this data. I don't think that clicks are a bad input per se.
>>
>> I agree that it's not obvious how to meaningfully translate user actions
>> into a linear scale. "1" per click and "10" for purchase or something is a
>> guess. I do think you will learn something from the data this way.
>>
>> There is nothing conceptually wrong with mixing real data and estimated
>> data. If the results aren't looking right, it is not a problem with the
>> concept, but the mapping of action onto some rating scale. I think it's
>> hard to get that right, but is not impossible to get it "good".
>>
>> On Sat, Jan 28, 2012 at 10:15 AM, Lee Carroll
>> <le...@googlemail.com>wrote:
>>
>>>> I would argue, though, that .recommend() is aimed at the latter task:
>>> No . I think the mismatch here is you are using at best a wild guess
>>> at a preference for the convenience of using a recommender and then in
>>> the same breath expecting the recommender to "understand" that you are
>>> not using preferences at all and actually have no idea what the user
>>> preference is. You cant have it both ways :-)
>>>
>>> A click through on an item is not a measure of user preference for
>>> that item. I know its not what you want to hear (or better what your
>>> business users want to here) but there it is.
>>>
>>> We can pretend, or maybe even build a convincing narrative that a
>>> click is some sort of item association and use that as a proxy
>>> preference and we might even get some mileage out of it, but we should
>>> not change the behaviour of the .recommend() to hide its short
>>> comings.
>>>

Re: Add on to itemsimilarity

Posted by Sean Owen <sr...@gmail.com>.

Yeah, though that's not recommend(), that's estimatePreference(). It could
throw an exception or something but seemed nicer to return the answer it
has; this isn't an intended use case.

If you include known data in the test, it's going to get the estimate
exactly right, and that doesn't say anything about the recommender, so
known data is not sent to this method in a test anyway.

On Sat, Jan 28, 2012 at 10:06 PM, Lance Norskog <go...@gmail.com> wrote:

> > I think it would be surprising behavior for a recommender to return data
> it
> already knows; I just think the implicit contract is to return only
> predictions. That's how real-world recommender systems appear to behave, to
> the end user; Amazon doesn't show you books you have already read, even if
> indeed they may be some of your favorites ever.
>
> The current recommender algorithms do this.
> SlopeOneRecommender: estimatePreference(), line 123:
>  public float estimatePreference(long userID, long itemID) throws
> TasteException {
>    DataModel model = getDataModel();
>    Float actualPref = model.getPreferenceValue(userID, itemID);
>    if (actualPref != null) {
>      return actualPref;
>    }
>    return doEstimatePreference(userID, itemID);
>  }
>
> As do all of the other recommenders, where the algorithm makes it
> possible. This makes <1% difference in the RMSE.
>
> On Sat, Jan 28, 2012 at 12:33 PM, Ted Dunning <te...@gmail.com>
> wrote:
> > Also, Lee, I think you have it backwards.  It is true that clicks are not
> > the same thing as preferences, but I don't think that I give a fig about
> > preferences since they are the internal mental state of the visitor.  I
> > know that some of the visitors are mental, but I don't really care since
> > that is their own business.  What I care about is encouraging certain
> > behaviors.
> >
> > So I find that mental state estimations are the indirect way to model and
> > predict behaviors while directly modeling behaviors based on observed
> > behaviors is, well, more direct.
> >
> > This is compounded by the fact that asking people to rate things invokes
> a
> > really complicated social interaction which does not directly sample the
> > internal mental state (i.e. the real preference) but instead measures
> > behavior that is a very complicated outcome of social pressures,
> > expectations and the internal mental state.  So using ratings boils down
> to
> > using one kind of behavior to estimate mental state that then is
> > hypothesized to result in the desired second kind of behavior.
> >
> >
> >
> > On Sat, Jan 28, 2012 at 10:51 AM, Sean Owen <sr...@gmail.com> wrote:
> >
> >> It means *something* that a user clicked on one item and not 10,000
> others.
> >> You will learn things like that Star Wars and Star Trek are somehow
> related
> >> from this data. I don't think that clicks are a bad input per se.
> >>
> >> I agree that it's not obvious how to meaningfully translate user actions
> >> into a linear scale. "1" per click and "10" for purchase or something
> is a
> >> guess. I do think you will learn something from the data this way.
> >>
> >> There is nothing conceptually wrong with mixing real data and estimated
> >> data. If the results aren't looking right, it is not a problem with the
> >> concept, but the mapping of action onto some rating scale. I think it's
> >> hard to get that right, but is not impossible to get it "good".
> >>
> >> On Sat, Jan 28, 2012 at 10:15 AM, Lee Carroll
> >> <le...@googlemail.com>wrote:
> >>
> >> > > I would argue, though, that .recommend() is aimed at the latter
> task:
> >> >
> >> > No . I think the mismatch here is you are using at best a wild guess
> >> > at a preference for the convenience of using a recommender and then in
> >> > the same breath expecting the recommender to "understand" that you are
> >> > not using preferences at all and actually have no idea what the user
> >> > preference is. You cant have it both ways :-)
> >> >
> >> > A click through on an item is not a measure of user preference for
> >> > that item. I know its not what you want to hear (or better what your
> >> > business users want to here) but there it is.
> >> >
> >> > We can pretend, or maybe even build a convincing narrative that a
> >> > click is some sort of item association and use that as a proxy
> >> > preference and we might even get some mileage out of it, but we should
> >> > not change the behaviour of the .recommend() to hide its short
> >> > comings.
> >> >
> >>
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>

Re: Add on to itemsimilarity

Posted by Lance Norskog <go...@gmail.com>.

> I think it would be surprising behavior for a recommender to return data it
already knows; I just think the implicit contract is to return only
predictions. That's how real-world recommender systems appear to behave, to
the end user; Amazon doesn't show you books you have already read, even if
indeed they may be some of your favorites ever.

The current recommender algorithms do this.
SlopeOneRecommender: estimatePreference(), line 123:
  public float estimatePreference(long userID, long itemID) throws
TasteException {
    DataModel model = getDataModel();
    Float actualPref = model.getPreferenceValue(userID, itemID);
    if (actualPref != null) {
      return actualPref;
    }
    return doEstimatePreference(userID, itemID);
  }

As do all of the other recommenders, where the algorithm makes it
possible. This makes <1% difference in the RMSE.

On Sat, Jan 28, 2012 at 12:33 PM, Ted Dunning <te...@gmail.com> wrote:
> Also, Lee, I think you have it backwards.  It is true that clicks are not
> the same thing as preferences, but I don't think that I give a fig about
> preferences since they are the internal mental state of the visitor.  I
> know that some of the visitors are mental, but I don't really care since
> that is their own business.  What I care about is encouraging certain
> behaviors.
>
> So I find that mental state estimations are the indirect way to model and
> predict behaviors while directly modeling behaviors based on observed
> behaviors is, well, more direct.
>
> This is compounded by the fact that asking people to rate things invokes a
> really complicated social interaction which does not directly sample the
> internal mental state (i.e. the real preference) but instead measures
> behavior that is a very complicated outcome of social pressures,
> expectations and the internal mental state.  So using ratings boils down to
> using one kind of behavior to estimate mental state that then is
> hypothesized to result in the desired second kind of behavior.
>
>
>
> On Sat, Jan 28, 2012 at 10:51 AM, Sean Owen <sr...@gmail.com> wrote:
>
>> It means *something* that a user clicked on one item and not 10,000 others.
>> You will learn things like that Star Wars and Star Trek are somehow related
>> from this data. I don't think that clicks are a bad input per se.
>>
>> I agree that it's not obvious how to meaningfully translate user actions
>> into a linear scale. "1" per click and "10" for purchase or something is a
>> guess. I do think you will learn something from the data this way.
>>
>> There is nothing conceptually wrong with mixing real data and estimated
>> data. If the results aren't looking right, it is not a problem with the
>> concept, but the mapping of action onto some rating scale. I think it's
>> hard to get that right, but is not impossible to get it "good".
>>
>> On Sat, Jan 28, 2012 at 10:15 AM, Lee Carroll
>> <le...@googlemail.com>wrote:
>>
>> > > I would argue, though, that .recommend() is aimed at the latter task:
>> >
>> > No . I think the mismatch here is you are using at best a wild guess
>> > at a preference for the convenience of using a recommender and then in
>> > the same breath expecting the recommender to "understand" that you are
>> > not using preferences at all and actually have no idea what the user
>> > preference is. You cant have it both ways :-)
>> >
>> > A click through on an item is not a measure of user preference for
>> > that item. I know its not what you want to hear (or better what your
>> > business users want to here) but there it is.
>> >
>> > We can pretend, or maybe even build a convincing narrative that a
>> > click is some sort of item association and use that as a proxy
>> > preference and we might even get some mileage out of it, but we should
>> > not change the behaviour of the .recommend() to hide its short
>> > comings.
>> >
>>



-- 
Lance Norskog
goksron@gmail.com

Re: Add on to itemsimilarity

Posted by Ted Dunning <te...@gmail.com>.

Also, Lee, I think you have it backwards.  It is true that clicks are not
the same thing as preferences, but I don't think that I give a fig about
preferences since they are the internal mental state of the visitor.  I
know that some of the visitors are mental, but I don't really care since
that is their own business.  What I care about is encouraging certain
behaviors.

So I find that mental state estimations are the indirect way to model and
predict behaviors while directly modeling behaviors based on observed
behaviors is, well, more direct.

This is compounded by the fact that asking people to rate things invokes a
really complicated social interaction which does not directly sample the
internal mental state (i.e. the real preference) but instead measures
behavior that is a very complicated outcome of social pressures,
expectations and the internal mental state.  So using ratings boils down to
using one kind of behavior to estimate mental state that then is
hypothesized to result in the desired second kind of behavior.

On Sat, Jan 28, 2012 at 10:51 AM, Sean Owen <sr...@gmail.com> wrote:

> It means *something* that a user clicked on one item and not 10,000 others.
> You will learn things like that Star Wars and Star Trek are somehow related
> from this data. I don't think that clicks are a bad input per se.
>
> I agree that it's not obvious how to meaningfully translate user actions
> into a linear scale. "1" per click and "10" for purchase or something is a
> guess. I do think you will learn something from the data this way.
>
> There is nothing conceptually wrong with mixing real data and estimated
> data. If the results aren't looking right, it is not a problem with the
> concept, but the mapping of action onto some rating scale. I think it's
> hard to get that right, but is not impossible to get it "good".
>
> On Sat, Jan 28, 2012 at 10:15 AM, Lee Carroll
> <le...@googlemail.com>wrote:
>
> > > I would argue, though, that .recommend() is aimed at the latter task:
> >
> > No . I think the mismatch here is you are using at best a wild guess
> > at a preference for the convenience of using a recommender and then in
> > the same breath expecting the recommender to "understand" that you are
> > not using preferences at all and actually have no idea what the user
> > preference is. You cant have it both ways :-)
> >
> > A click through on an item is not a measure of user preference for
> > that item. I know its not what you want to hear (or better what your
> > business users want to here) but there it is.
> >
> > We can pretend, or maybe even build a convincing narrative that a
> > click is some sort of item association and use that as a proxy
> > preference and we might even get some mileage out of it, but we should
> > not change the behaviour of the .recommend() to hide its short
> > comings.
> >
>

Re: Add on to itemsimilarity

Posted by Sean Owen <sr...@gmail.com>.

It means *something* that a user clicked on one item and not 10,000 others.
You will learn things like that Star Wars and Star Trek are somehow related
from this data. I don't think that clicks are a bad input per se.

I agree that it's not obvious how to meaningfully translate user actions
into a linear scale. "1" per click and "10" for purchase or something is a
guess. I do think you will learn something from the data this way.

There is nothing conceptually wrong with mixing real data and estimated
data. If the results aren't looking right, it is not a problem with the
concept, but the mapping of action onto some rating scale. I think it's
hard to get that right, but is not impossible to get it "good".

On Sat, Jan 28, 2012 at 10:15 AM, Lee Carroll
<le...@googlemail.com>wrote:

> > I would argue, though, that .recommend() is aimed at the latter task:
>
> No . I think the mismatch here is you are using at best a wild guess
> at a preference for the convenience of using a recommender and then in
> the same breath expecting the recommender to "understand" that you are
> not using preferences at all and actually have no idea what the user
> preference is. You cant have it both ways :-)
>
> A click through on an item is not a measure of user preference for
> that item. I know its not what you want to hear (or better what your
> business users want to here) but there it is.
>
> We can pretend, or maybe even build a convincing narrative that a
> click is some sort of item association and use that as a proxy
> preference and we might even get some mileage out of it, but we should
> not change the behaviour of the .recommend() to hide its short
> comings.
>

Re: Add on to itemsimilarity

Posted by Lee Carroll <le...@googlemail.com>.

> I would argue, though, that .recommend() is aimed at the latter task:

No . I think the mismatch here is you are using at best a wild guess
at a preference for the convenience of using a recommender and then in
the same breath expecting the recommender to "understand" that you are
not using preferences at all and actually have no idea what the user
preference is. You cant have it both ways :-)

A click through on an item is not a measure of user preference for
that item. I know its not what you want to hear (or better what your
business users want to here) but there it is.

We can pretend, or maybe even build a convincing narrative that a
click is some sort of item association and use that as a proxy
preference and we might even get some mileage out of it, but we should
not change the behaviour of the .recommend() to hide its short
comings.

Re: Add on to itemsimilarity

Posted by Sean Owen <sr...@gmail.com>.

I think it would be surprising behavior for a recommender to return data it
already knows; I just think the implicit contract is to return only
predictions. That's how real-world recommender systems appear to behave, to
the end user; Amazon doesn't show you books you have already read, even if
indeed they may be some of your favorites ever.

That's how it's built now anyway so would prefer not to change it, because
you can combine with data you already have, if that's what you want, more
easily than you can strip out the data you already have from the result, if
that's not what you want. You also run the risk of the top items being all
existing data points; then the recommender is not providing any useful
extra info.

You can make RecommendedItem for all existing data points, mix with
recommendations, and sort.

If you don't have rating values, then you can't use a recommender built on
predicting ratings, since they will all be 1, and your result is as you say
random. The answer is, don't do that! Either you don't use ratings, and use
the boolean versions, or you do use ratings (like your decaying click
value) and then you can use either.

On Fri, Jan 27, 2012 at 10:09 AM, Anatoliy Kats <a....@rambler-co.ru>wrote:

> So you're proposing that we separate the actions of estimating preferences
> for unknown items, and recommending items to users to click :  the latter
> could include some items for which a preference has been expressed.  It's a
> good idea to think that way, thanks for the tip.  I would argue, though,
> that .recommend() is aimed at the latter task:  it predicts preferences,
> and sorts them, and returns the top N items.  It is a final step in a
> process that includes unknown preference estimation as an intermediate
> step.  This is built into Mahout as I see it, by separating .recommend()
> and .estimatePreference().  That's why I still think the most elegant
> solution is simply adding known preference values to the predicted ones to
> the set of possible recommendations.  AFAIK this is most easily
> accomplished by playing around with CandidateItemStrategies.  How would you
> go about it without having to write your own sorting function?
>
> About boolean recommenders:  Many of my users made no purchases, only
> clicks.  So, if I use a generic recommender, it will make random
> recommendations because my training data is essentially boolean.  Has
> anyone else run into this problem?  One solution I am about to try is
> letting the rating value of a click decay with time since the click was
> made.  I am not sure if the ratings will be different enough for
> GenericRecommender to work, and I am also not sure I am justified in
> reducing the item similarity between two items because two users clicked on
> them at different times.  Has anyone tried a solution based on a
> regularized normalization of some sort?
>
> Thanks.
>
>

Re: Add on to itemsimilarity

Posted by Anatoliy Kats <a....@rambler-co.ru>.

So you're proposing that we separate the actions of estimating 
preferences for unknown items, and recommending items to users to click 
:  the latter could include some items for which a preference has been 
expressed.  It's a good idea to think that way, thanks for the tip.  I 
would argue, though, that .recommend() is aimed at the latter task:  it 
predicts preferences, and sorts them, and returns the top N items.  It 
is a final step in a process that includes unknown preference estimation 
as an intermediate step.  This is built into Mahout as I see it, by 
separating .recommend() and .estimatePreference().  That's why I still 
think the most elegant solution is simply adding known preference values 
to the predicted ones to the set of possible recommendations.  AFAIK 
this is most easily accomplished by playing around with 
CandidateItemStrategies.  How would you go about it without having to 
write your own sorting function?

About boolean recommenders:  Many of my users made no purchases, only 
clicks.  So, if I use a generic recommender, it will make random 
recommendations because my training data is essentially boolean.  Has 
anyone else run into this problem?  One solution I am about to try is 
letting the rating value of a click decay with time since the click was 
made.  I am not sure if the ratings will be different enough for 
GenericRecommender to work, and I am also not sure I am justified in 
reducing the item similarity between two items because two users clicked 
on them at different times.  Has anyone tried a solution based on a 
regularized normalization of some sort?

Thanks.


On 01/26/2012 01:28 PM, Sean Owen wrote:
> It's correct to think of a recommender as something that can fill in the
> blanks. If you transform your input into a numerical scale, it ought to
> fill in estimated values for the items you don't have input for. It does
> not repeat back to you the input you already have -- these are removed from
> results -- on the theory that you already have that information.
>
> It does not mean you can't use both the real and estimated data together,
> in the end. You could add back in clicked items, with their known values,
> and use that as the basis of something. You should not need to estimate
> preference for already-rated items -- you already have that info, right?
>
> So perhaps it is a question of setting the scale correctly? If a click = 1,
> and maybe a purchase = 10, then an item estimated at 0.3 is judged to be
> less interesting than clicked items. Something else is going wrong with the
> data, or rating scale, or even algorithm if these results are consistently
> unintuitive.
>
> (Not all recommenders operate by estimating preferences, in particular the
> ones that don't use preferences: the ones that deal with 'boolean' data. I
> am not sure that is at play here though?)
>
>
> On Thu, Jan 26, 2012 at 8:28 AM, Anatoliy Kats<a....@rambler-co.ru>  wrote:
>
>> I have not seen this discussion from the beginning, but I think the
>> troubles I'm having are similar in nature.  We are recommending items the
>> user can buy on our website.  Our preferences are past purchases, and also
>> past clicks on the item's description.  If a purchase was made, certainly
>> we do not want to recommend the item again, but if it was only a click, we
>> are even more confident that we should be recommending that item.  Yet the
>> recommenders are hardcoded not to.  I managed to get around this by
>> changing the recommender's CandidateItemSimilarity.
>>
>> I also need to estimatePreference() of the items the user clicked on, or
>> at least I think I do.  The unclicked items have an estimated preference of
>> around 0.3, whereas the click is treated as a rating of 1.  Intuitively
>> that seems unfair, I'd essentially only be recommending items the user
>> clicked on.  I have my own recommender class which uses
>> Generic...Recommender() as a delegate.  So, I can override the
>> estimatePreference() to return something else, but this concerns me for two
>> reasons.  First, this is not estimatePreference()'s intended usage, so I'm
>> afraid of breaking something.  Second, many recommenders have a private
>> doEstimatePreference() method that I'd love to call for already-rated
>> items, but since it is a private method of my delegate, I cannot.  That
>> makes me sad.
>>
>> I hope this helps some of you, and I would appreciate some feedback on
>> whether what I'm doing is even a good idea, and how to go about it.
>>
>> Thanks,
>>
>> Anatoliy
>>
>>
>> On 01/25/2012 09:36 PM, Sean Owen wrote:
>>
>>> (moving to user@)
>>>
>>> I think I understand more about what you are doing. It doesn't quite make
>>> sense to say you will train a recommender on the output of the
>>> recommender,
>>> but I understand that you mean you have some information about what users
>>> have visited what attractions or shows.
>>>
>>> This is classic recommendation. You put that in, and it can tell you what
>>> other attractions, shows, etc. the user may like.
>>>
>>> So going back to the beginning, I'm not yet clear on why that isn't
>>> already
>>> the answer for you, since you have built this. Explain again what else you
>>> are trying to do to filter or process the result?
>>>
>>> On Wed, Jan 25, 2012 at 5:25 PM, Saikat Kanjilal<sx...@hotmail.com>**
>>> wrote:
>>>
>>>   Putting back on the list, we want to recommend new items in the park, an
>>>> item could be:1) attraction2) restaurant3) show4) Ride5) resort
>>>> Our real data if you will is the recommendations that result from
>>>> understanding their preferences in more detail based on their
>>>> reservations
>>>> and resort stays.  So I wonder if our real data is our training data that
>>>> the recommender can use for training and calculate predicted data based
>>>> on
>>>> that.
>>>>
>>>> Date: Wed, 25 Jan 2012 17:20:02 +0000
>>>> Subject: Re: Add on to itemsimilarity
>>>> From: srowen@gmail.com
>>>> To: sxk1969@hotmail.com
>>>>
>>>> (do you mind putting this back on the list? might be a good discussion
>>>> for
>>>> others)
>>>> What are you recommending to the user -- theme parks, rides at a theme
>>>> park?
>>>> Yes, you would always be recommending 'unknown' things to the user. You
>>>> already 'know' how much they like or dislike the things for which you
>>>> have
>>>> data, so recommendations aren't of use to you.
>>>>
>>>> Of course, you can use both real and predicted data in your system -- it
>>>> depends on what you are trying to accomplish. The recommender's role is
>>>> creating the predicted data.
>>>>
>>>>
>>>> On Wed, Jan 25, 2012 at 5:12 PM, Saikat Kanjilal<sx...@hotmail.com>
>>>> wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Actually let me more clear, we are building a recommendations engine for
>>>> a
>>>> theme parks experience,  the user preferences is something we are storing
>>>> based on the user's reservations and analytics, this is something that's
>>>> stored before the user rates any items and may or may not have a direct
>>>> relationship to the recommendations the user makes as they go around the
>>>> park.  This is due to the fact that the user recommendations could be
>>>> other
>>>> rides or attractions that exist outside of the actual preferences.  Its
>>>> not
>>>> clear yet to me how to tie these preferences into the item similarity
>>>> results.
>>>>
>>>>
>>>>

Re: Add on to itemsimilarity

Posted by Sean Owen <sr...@gmail.com>.

It's correct to think of a recommender as something that can fill in the
blanks. If you transform your input into a numerical scale, it ought to
fill in estimated values for the items you don't have input for. It does
not repeat back to you the input you already have -- these are removed from
results -- on the theory that you already have that information.

It does not mean you can't use both the real and estimated data together,
in the end. You could add back in clicked items, with their known values,
and use that as the basis of something. You should not need to estimate
preference for already-rated items -- you already have that info, right?

So perhaps it is a question of setting the scale correctly? If a click = 1,
and maybe a purchase = 10, then an item estimated at 0.3 is judged to be
less interesting than clicked items. Something else is going wrong with the
data, or rating scale, or even algorithm if these results are consistently
unintuitive.

(Not all recommenders operate by estimating preferences, in particular the
ones that don't use preferences: the ones that deal with 'boolean' data. I
am not sure that is at play here though?)


On Thu, Jan 26, 2012 at 8:28 AM, Anatoliy Kats <a....@rambler-co.ru> wrote:

> I have not seen this discussion from the beginning, but I think the
> troubles I'm having are similar in nature.  We are recommending items the
> user can buy on our website.  Our preferences are past purchases, and also
> past clicks on the item's description.  If a purchase was made, certainly
> we do not want to recommend the item again, but if it was only a click, we
> are even more confident that we should be recommending that item.  Yet the
> recommenders are hardcoded not to.  I managed to get around this by
> changing the recommender's CandidateItemSimilarity.
>
> I also need to estimatePreference() of the items the user clicked on, or
> at least I think I do.  The unclicked items have an estimated preference of
> around 0.3, whereas the click is treated as a rating of 1.  Intuitively
> that seems unfair, I'd essentially only be recommending items the user
> clicked on.  I have my own recommender class which uses
> Generic...Recommender() as a delegate.  So, I can override the
> estimatePreference() to return something else, but this concerns me for two
> reasons.  First, this is not estimatePreference()'s intended usage, so I'm
> afraid of breaking something.  Second, many recommenders have a private
> doEstimatePreference() method that I'd love to call for already-rated
> items, but since it is a private method of my delegate, I cannot.  That
> makes me sad.
>
> I hope this helps some of you, and I would appreciate some feedback on
> whether what I'm doing is even a good idea, and how to go about it.
>
> Thanks,
>
> Anatoliy
>
>
> On 01/25/2012 09:36 PM, Sean Owen wrote:
>
>> (moving to user@)
>>
>> I think I understand more about what you are doing. It doesn't quite make
>> sense to say you will train a recommender on the output of the
>> recommender,
>> but I understand that you mean you have some information about what users
>> have visited what attractions or shows.
>>
>> This is classic recommendation. You put that in, and it can tell you what
>> other attractions, shows, etc. the user may like.
>>
>> So going back to the beginning, I'm not yet clear on why that isn't
>> already
>> the answer for you, since you have built this. Explain again what else you
>> are trying to do to filter or process the result?
>>
>> On Wed, Jan 25, 2012 at 5:25 PM, Saikat Kanjilal<sx...@hotmail.com>**
>> wrote:
>>
>>  Putting back on the list, we want to recommend new items in the park, an
>>> item could be:1) attraction2) restaurant3) show4) Ride5) resort
>>> Our real data if you will is the recommendations that result from
>>> understanding their preferences in more detail based on their
>>> reservations
>>> and resort stays.  So I wonder if our real data is our training data that
>>> the recommender can use for training and calculate predicted data based
>>> on
>>> that.
>>>
>>> Date: Wed, 25 Jan 2012 17:20:02 +0000
>>> Subject: Re: Add on to itemsimilarity
>>> From: srowen@gmail.com
>>> To: sxk1969@hotmail.com
>>>
>>> (do you mind putting this back on the list? might be a good discussion
>>> for
>>> others)
>>> What are you recommending to the user -- theme parks, rides at a theme
>>> park?
>>> Yes, you would always be recommending 'unknown' things to the user. You
>>> already 'know' how much they like or dislike the things for which you
>>> have
>>> data, so recommendations aren't of use to you.
>>>
>>> Of course, you can use both real and predicted data in your system -- it
>>> depends on what you are trying to accomplish. The recommender's role is
>>> creating the predicted data.
>>>
>>>
>>> On Wed, Jan 25, 2012 at 5:12 PM, Saikat Kanjilal<sx...@hotmail.com>
>>> wrote:
>>>
>>>
>>>
>>>
>>>
>>> Actually let me more clear, we are building a recommendations engine for
>>> a
>>> theme parks experience,  the user preferences is something we are storing
>>> based on the user's reservations and analytics, this is something that's
>>> stored before the user rates any items and may or may not have a direct
>>> relationship to the recommendations the user makes as they go around the
>>> park.  This is due to the fact that the user recommendations could be
>>> other
>>> rides or attractions that exist outside of the actual preferences.  Its
>>> not
>>> clear yet to me how to tie these preferences into the item similarity
>>> results.
>>>
>>>
>>>
>

Re: Add on to itemsimilarity

Posted by Anatoliy Kats <a....@rambler-co.ru>.

I have not seen this discussion from the beginning, but I think the 
troubles I'm having are similar in nature.  We are recommending items 
the user can buy on our website.  Our preferences are past purchases, 
and also past clicks on the item's description.  If a purchase was made, 
certainly we do not want to recommend the item again, but if it was only 
a click, we are even more confident that we should be recommending that 
item.  Yet the recommenders are hardcoded not to.  I managed to get 
around this by changing the recommender's CandidateItemSimilarity.

I also need to estimatePreference() of the items the user clicked on, or 
at least I think I do.  The unclicked items have an estimated preference 
of around 0.3, whereas the click is treated as a rating of 1.  
Intuitively that seems unfair, I'd essentially only be recommending 
items the user clicked on.  I have my own recommender class which uses 
Generic...Recommender() as a delegate.  So, I can override the 
estimatePreference() to return something else, but this concerns me for 
two reasons.  First, this is not estimatePreference()'s intended usage, 
so I'm afraid of breaking something.  Second, many recommenders have a 
private doEstimatePreference() method that I'd love to call for 
already-rated items, but since it is a private method of my delegate, I 
cannot.  That makes me sad.

I hope this helps some of you, and I would appreciate some feedback on 
whether what I'm doing is even a good idea, and how to go about it.

Thanks,

Anatoliy

On 01/25/2012 09:36 PM, Sean Owen wrote:
> (moving to user@)
>
> I think I understand more about what you are doing. It doesn't quite make
> sense to say you will train a recommender on the output of the recommender,
> but I understand that you mean you have some information about what users
> have visited what attractions or shows.
>
> This is classic recommendation. You put that in, and it can tell you what
> other attractions, shows, etc. the user may like.
>
> So going back to the beginning, I'm not yet clear on why that isn't already
> the answer for you, since you have built this. Explain again what else you
> are trying to do to filter or process the result?
>
> On Wed, Jan 25, 2012 at 5:25 PM, Saikat Kanjilal<sx...@hotmail.com>wrote:
>
>> Putting back on the list, we want to recommend new items in the park, an
>> item could be:1) attraction2) restaurant3) show4) Ride5) resort
>> Our real data if you will is the recommendations that result from
>> understanding their preferences in more detail based on their reservations
>> and resort stays.  So I wonder if our real data is our training data that
>> the recommender can use for training and calculate predicted data based on
>> that.
>>
>> Date: Wed, 25 Jan 2012 17:20:02 +0000
>> Subject: Re: Add on to itemsimilarity
>> From: srowen@gmail.com
>> To: sxk1969@hotmail.com
>>
>> (do you mind putting this back on the list? might be a good discussion for
>> others)
>> What are you recommending to the user -- theme parks, rides at a theme
>> park?
>> Yes, you would always be recommending 'unknown' things to the user. You
>> already 'know' how much they like or dislike the things for which you have
>> data, so recommendations aren't of use to you.
>>
>> Of course, you can use both real and predicted data in your system -- it
>> depends on what you are trying to accomplish. The recommender's role is
>> creating the predicted data.
>>
>>
>> On Wed, Jan 25, 2012 at 5:12 PM, Saikat Kanjilal<sx...@hotmail.com>
>> wrote:
>>
>>
>>
>>
>>
>> Actually let me more clear, we are building a recommendations engine for a
>> theme parks experience,  the user preferences is something we are storing
>> based on the user's reservations and analytics, this is something that's
>> stored before the user rates any items and may or may not have a direct
>> relationship to the recommendations the user makes as they go around the
>> park.  This is due to the fact that the user recommendations could be other
>> rides or attractions that exist outside of the actual preferences.  Its not
>> clear yet to me how to tie these preferences into the item similarity
>> results.
>>
>>

Re: Add on to itemsimilarity

Posted by Sean Owen <sr...@gmail.com>.

I am not sure that fits in to an item-based recommender since this is data
that is not about your 'items'.

You might use it to influence a user similarity metric in a user-based
computation.

Or better, don't try to use this data yet and see where you get with the
simple implementation.

Sean

On Wed, Jan 25, 2012 at 6:40 PM, Saikat Kanjilal <sx...@hotmail.com>wrote:

>
> Understood Sean thanks for your help, one other question I am trying to
> figure out what algorithms I could use along with item similarity that
> would take in the user's reservation and resort stay data and tie that into
> creating additional recommendation data points (to be more specific
> additional training data if you will) that could be fed into the item
> similarity algorithm.
>
> > Date: Wed, 25 Jan 2012 17:36:49 +0000
> > Subject: Re: Add on to itemsimilarity
> > From: srowen@gmail.com
> > To: user@mahout.apache.org
> >
> > (moving to user@)
> >
> > I think I understand more about what you are doing. It doesn't quite make
> > sense to say you will train a recommender on the output of the
> recommender,
> > but I understand that you mean you have some information about what users
> > have visited what attractions or shows.
> >
> > This is classic recommendation. You put that in, and it can tell you what
> > other attractions, shows, etc. the user may like.
> >
> > So going back to the beginning, I'm not yet clear on why that isn't
> already
> > the answer for you, since you have built this. Explain again what else
> you
> > are trying to do to filter or process the result?
> >
> > On Wed, Jan 25, 2012 at 5:25 PM, Saikat Kanjilal <sxk1969@hotmail.com
> >wrote:
> >
> > >
> > > Putting back on the list, we want to recommend new items in the park,
> an
> > > item could be:1) attraction2) restaurant3) show4) Ride5) resort
> > > Our real data if you will is the recommendations that result from
> > > understanding their preferences in more detail based on their
> reservations
> > > and resort stays.  So I wonder if our real data is our training data
> that
> > > the recommender can use for training and calculate predicted data
> based on
> > > that.
> > >
> > > Date: Wed, 25 Jan 2012 17:20:02 +0000
> > > Subject: Re: Add on to itemsimilarity
> > > From: srowen@gmail.com
> > > To: sxk1969@hotmail.com
> > >
> > > (do you mind putting this back on the list? might be a good discussion
> for
> > > others)
> > > What are you recommending to the user -- theme parks, rides at a theme
> > > park?
> > > Yes, you would always be recommending 'unknown' things to the user. You
> > > already 'know' how much they like or dislike the things for which you
> have
> > > data, so recommendations aren't of use to you.
> > >
> > > Of course, you can use both real and predicted data in your system --
> it
> > > depends on what you are trying to accomplish. The recommender's role is
> > > creating the predicted data.
> > >
> > >
> > > On Wed, Jan 25, 2012 at 5:12 PM, Saikat Kanjilal <sx...@hotmail.com>
> > > wrote:
> > >
> > >
> > >
> > >
> > >
> > > Actually let me more clear, we are building a recommendations engine
> for a
> > > theme parks experience,  the user preferences is something we are
> storing
> > > based on the user's reservations and analytics, this is something
> that's
> > > stored before the user rates any items and may or may not have a direct
> > > relationship to the recommendations the user makes as they go around
> the
> > > park.  This is due to the fact that the user recommendations could be
> other
> > > rides or attractions that exist outside of the actual preferences.
>  Its not
> > > clear yet to me how to tie these preferences into the item similarity
> > > results.
> > >
> > >
>
>

RE: Add on to itemsimilarity

Posted by Saikat Kanjilal <sx...@hotmail.com>.

Understood Sean thanks for your help, one other question I am trying to figure out what algorithms I could use along with item similarity that would take in the user's reservation and resort stay data and tie that into creating additional recommendation data points (to be more specific additional training data if you will) that could be fed into the item similarity algorithm. 

> Date: Wed, 25 Jan 2012 17:36:49 +0000
> Subject: Re: Add on to itemsimilarity
> From: srowen@gmail.com
> To: user@mahout.apache.org
> 
> (moving to user@)
> 
> I think I understand more about what you are doing. It doesn't quite make
> sense to say you will train a recommender on the output of the recommender,
> but I understand that you mean you have some information about what users
> have visited what attractions or shows.
> 
> This is classic recommendation. You put that in, and it can tell you what
> other attractions, shows, etc. the user may like.
> 
> So going back to the beginning, I'm not yet clear on why that isn't already
> the answer for you, since you have built this. Explain again what else you
> are trying to do to filter or process the result?
> 
> On Wed, Jan 25, 2012 at 5:25 PM, Saikat Kanjilal <sx...@hotmail.com>wrote:
> 
> >
> > Putting back on the list, we want to recommend new items in the park, an
> > item could be:1) attraction2) restaurant3) show4) Ride5) resort
> > Our real data if you will is the recommendations that result from
> > understanding their preferences in more detail based on their reservations
> > and resort stays.  So I wonder if our real data is our training data that
> > the recommender can use for training and calculate predicted data based on
> > that.
> >
> > Date: Wed, 25 Jan 2012 17:20:02 +0000
> > Subject: Re: Add on to itemsimilarity
> > From: srowen@gmail.com
> > To: sxk1969@hotmail.com
> >
> > (do you mind putting this back on the list? might be a good discussion for
> > others)
> > What are you recommending to the user -- theme parks, rides at a theme
> > park?
> > Yes, you would always be recommending 'unknown' things to the user. You
> > already 'know' how much they like or dislike the things for which you have
> > data, so recommendations aren't of use to you.
> >
> > Of course, you can use both real and predicted data in your system -- it
> > depends on what you are trying to accomplish. The recommender's role is
> > creating the predicted data.
> >
> >
> > On Wed, Jan 25, 2012 at 5:12 PM, Saikat Kanjilal <sx...@hotmail.com>
> > wrote:
> >
> >
> >
> >
> >
> > Actually let me more clear, we are building a recommendations engine for a
> > theme parks experience,  the user preferences is something we are storing
> > based on the user's reservations and analytics, this is something that's
> > stored before the user rates any items and may or may not have a direct
> > relationship to the recommendations the user makes as they go around the
> > park.  This is due to the fact that the user recommendations could be other
> > rides or attractions that exist outside of the actual preferences.  Its not
> > clear yet to me how to tie these preferences into the item similarity
> > results.
> >
> >