You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2009/01/28 19:51:46 UTC

Taste: Clearing CachingRecommender

Hi,

I'm trying out CachingRecommender to see how much it improves performance for me (it does a lot when the cache is all warmed up).
One thing that I assumed would happen automatically is that the CachingRecommender will be refreshed when the underlying file (FileDataModel) is refreshed.  Instead, I do see the FDM re-reading the input file when it detects the change, but my recommendations remain cached forever, as I'm not hitting memory limits.

I've been looking at the code, but can't see a clean way to "listen" to the FDM and find out when it finished re-reading the file, so I can clear my CachingRecommender.  Is there an existing way this should be done that I'm just not seeing in the code?

Ah, I see another thing now.  Consider this:

    Recommender recommender = new BooleanUserGenericUserBasedRecommender(model, hood, similarity);
    recommender = new CachingRecommender(recommender);

Now if I did somehow find a way to clear the caching recommender, I imagine I'd want to call one of its clear(...) methods, and this requires a cast back to CachingRecommender.  Is there a better way?

Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


Re: Taste: Clearing CachingRecommender

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hi,


----- Original Message ----
> From: Sean Owen <sr...@gmail.com>
> To: mahout-user@lucene.apache.org
> Sent: Thursday, January 29, 2009 4:05:50 PM
> Subject: Re: Taste: Clearing CachingRecommender
> 
> Sounds like you want one-way dependencies which I agree with, largely
> for simplicity. That's how it works now. The Recommender does not
> depend on the DataModel, but the other way around. Or are you in fact
> arguing for the other way around?

I'm referring to the fact that the DataModel changes first, and when it changes then the Recommender should know about that.

> There is a sound logic to that of course; when data changes, that
> signal starts from the DataModel and propagates from there. I suppose

That's the flow and dependency I was describing.

> I'd done it the other way just because that was the order in which the
> components depend on each other to produce recommendations. A
> Recommender has a reference to a DataModel and not the other way
> around.

Those references make sense.  It wouldn't make sense for DM to be aware of Recommender.  That's why I was suggesting a generic event/listener type of system, so that any listener interested in DM changes can register and DM will call it without knowing who exactly it is calling.

> One last interesting factor is that in some cases (particularly
> slope-one) it is possible to very quickly take into account a single
> preference change without recomputing everything. This is why
> Recommender has setPreference(); ultimately that calls to
> DataModel.setPreference() but along the way perhaps the Recommender
> can do better than simply clearing a cache.
> 
> All this said I am not convinced it wouldn't be better to completely
> turn around the dependencies here. I'd like to make a small change to
> FileDataModel now, per this thread, and then mull it over a little bit
> since I'd want to think through the implications a bit. It's
> medium-sized surgery on the code.

I think I'm confused by the direction of dependencies now, but I'll wait and see your changes next.

Thanks!
Otis

> On Thu, Jan 29, 2009 at 6:47 PM, Otis Gospodnetic
> wrote:
> > Right, a small change could affect recommendations.  But wanting to see the 
> change immediately means real-time changes, which are hard (though some people 
> manage to pull them off, e.g. Digg now and Findory in the past).
> >
> >
> > The nice thing about events is that the dependency is one way and components 
> up the dependency chain are not directly coupled to their dependents.  I think 
> we should ensure this remains the case in Taste.  In other words, exposing some 
> public method for refreshing the CachingRecommender and expecting something like 
> FileDataModel to call it when FDM reloads would be bad.  So, unless I'm missing 
> something, this takes us back to before/after event hooks... but you may think 
> of something better! :)
> >
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> >
> >
> > ----- Original Message ----
> >> From: Sean Owen 
> >> To: mahout-user@lucene.apache.org
> >> Sent: Wednesday, January 28, 2009 4:58:26 PM
> >> Subject: Re: Taste: Clearing CachingRecommender
> >>
> >> Yes, though you only need to rebuild if the file has changed of
> >> course. I like this so I will work on the change.
> >>
> >> This highlights the general problem that a change of just one data
> >> point, in theory, affects all recommendations.
> >>
> >> On Wed, Jan 28, 2009 at 7:33 PM, Otis Gospodnetic
> >> wrote:
> >> > At least in this particular case, on-demand reloading is fine (and nicely
> >> deterministic).... although, isn't that going to be more or less the same as
> >> simply throwing away an existing CachingRecommender instance and re-creating 
> a
> >> brand new one, starting from the newly created FileDataModel?
> >
> >


Re: Taste: Clearing CachingRecommender

Posted by Sean Owen <sr...@gmail.com>.
Sounds like you want one-way dependencies which I agree with, largely
for simplicity. That's how it works now. The Recommender does not
depend on the DataModel, but the other way around. Or are you in fact
arguing for the other way around?

There is a sound logic to that of course; when data changes, that
signal starts from the DataModel and propagates from there. I suppose
I'd done it the other way just because that was the order in which the
components depend on each other to produce recommendations. A
Recommender has a reference to a DataModel and not the other way
around.

One last interesting factor is that in some cases (particularly
slope-one) it is possible to very quickly take into account a single
preference change without recomputing everything. This is why
Recommender has setPreference(); ultimately that calls to
DataModel.setPreference() but along the way perhaps the Recommender
can do better than simply clearing a cache.

All this said I am not convinced it wouldn't be better to completely
turn around the dependencies here. I'd like to make a small change to
FileDataModel now, per this thread, and then mull it over a little bit
since I'd want to think through the implications a bit. It's
medium-sized surgery on the code.

On Thu, Jan 29, 2009 at 6:47 PM, Otis Gospodnetic
<ot...@yahoo.com> wrote:
> Right, a small change could affect recommendations.  But wanting to see the change immediately means real-time changes, which are hard (though some people manage to pull them off, e.g. Digg now and Findory in the past).
>
>
> The nice thing about events is that the dependency is one way and components up the dependency chain are not directly coupled to their dependents.  I think we should ensure this remains the case in Taste.  In other words, exposing some public method for refreshing the CachingRecommender and expecting something like FileDataModel to call it when FDM reloads would be bad.  So, unless I'm missing something, this takes us back to before/after event hooks... but you may think of something better! :)
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
>> From: Sean Owen <sr...@gmail.com>
>> To: mahout-user@lucene.apache.org
>> Sent: Wednesday, January 28, 2009 4:58:26 PM
>> Subject: Re: Taste: Clearing CachingRecommender
>>
>> Yes, though you only need to rebuild if the file has changed of
>> course. I like this so I will work on the change.
>>
>> This highlights the general problem that a change of just one data
>> point, in theory, affects all recommendations.
>>
>> On Wed, Jan 28, 2009 at 7:33 PM, Otis Gospodnetic
>> wrote:
>> > At least in this particular case, on-demand reloading is fine (and nicely
>> deterministic).... although, isn't that going to be more or less the same as
>> simply throwing away an existing CachingRecommender instance and re-creating a
>> brand new one, starting from the newly created FileDataModel?
>
>

Re: Taste: Clearing CachingRecommender

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Right, a small change could affect recommendations.  But wanting to see the change immediately means real-time changes, which are hard (though some people manage to pull them off, e.g. Digg now and Findory in the past).


The nice thing about events is that the dependency is one way and components up the dependency chain are not directly coupled to their dependents.  I think we should ensure this remains the case in Taste.  In other words, exposing some public method for refreshing the CachingRecommender and expecting something like FileDataModel to call it when FDM reloads would be bad.  So, unless I'm missing something, this takes us back to before/after event hooks... but you may think of something better! :)

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Sean Owen <sr...@gmail.com>
> To: mahout-user@lucene.apache.org
> Sent: Wednesday, January 28, 2009 4:58:26 PM
> Subject: Re: Taste: Clearing CachingRecommender
> 
> Yes, though you only need to rebuild if the file has changed of
> course. I like this so I will work on the change.
> 
> This highlights the general problem that a change of just one data
> point, in theory, affects all recommendations.
> 
> On Wed, Jan 28, 2009 at 7:33 PM, Otis Gospodnetic
> wrote:
> > At least in this particular case, on-demand reloading is fine (and nicely 
> deterministic).... although, isn't that going to be more or less the same as 
> simply throwing away an existing CachingRecommender instance and re-creating a 
> brand new one, starting from the newly created FileDataModel?


Re: Taste: Clearing CachingRecommender

Posted by Sean Owen <sr...@gmail.com>.
Yes, though you only need to rebuild if the file has changed of
course. I like this so I will work on the change.

This highlights the general problem that a change of just one data
point, in theory, affects all recommendations.

On Wed, Jan 28, 2009 at 7:33 PM, Otis Gospodnetic
<ot...@yahoo.com> wrote:
> At least in this particular case, on-demand reloading is fine (and nicely deterministic).... although, isn't that going to be more or less the same as simply throwing away an existing CachingRecommender instance and re-creating a brand new one, starting from the newly created FileDataModel?

Re: Taste: Clearing CachingRecommender

Posted by Otis Gospodnetic <ot...@yahoo.com>.
At least in this particular case, on-demand reloading is fine (and nicely deterministic).... although, isn't that going to be more or less the same as simply throwing away an existing CachingRecommender instance and re-creating a brand new one, starting from the newly created FileDataModel?


We have a similar situation in Solr-land.  There, one can build a special index used for spellchecking.  This spellcheck index reads data from the main index, so when the main index changes, it's kind of important to rebuild the spellcheck index, and this is how that's done:

http://wiki.apache.org/solr/SpellCheckComponent#head-4375b11a78463f5f8b70967074d0787ea3778592

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Sean Owen <sr...@gmail.com>
> To: mahout-user@lucene.apache.org
> Sent: Wednesday, January 28, 2009 2:19:52 PM
> Subject: Re: Taste: Clearing CachingRecommender
> 
> ... yeah, I had hoped to avoid the complexities that come from making
> this a full-fledged arbitrary dependency system and dealing with
> circularities and all that.
> 
> Perhaps to rationalize this particular situation, I can replace the
> part that auto-reloads every x minutes, with a mechanism that will
> reload on demand, but not more than every x minutes. I think that fits
> the model better actually, and should be more efficient.
> 
> Or is it important to get updates more quickly -- how quickly?
> 
> On Wed, Jan 28, 2009 at 7:08 PM, Otis Gospodnetic
> wrote:
> > Would it be possible to create appropriate abstract methods in appropriate 
> classes (sorry for vagueness, I'm not yet familiar with the code enough to 
> suggest exact places) that would allow dependent classes to list for events 
> triggered/processed by classes they depend on.  Here is a concrete example.
> > I like how the FDM re-checks the file and re-loads it.  The Recommender 
> depends on data read by FDM.  So I would hope I could add Recommender as a FDM 
> event listener.  The FDM might have a method such as addListener(FDMListener), 
> abstract beforeReload(...) and abstract afterReload(...).  These methods would 
> be called and implemented by Recommender, which could then clear its cache or do 
> nothing.


Re: Taste: Clearing CachingRecommender

Posted by Sean Owen <sr...@gmail.com>.
... yeah, I had hoped to avoid the complexities that come from making
this a full-fledged arbitrary dependency system and dealing with
circularities and all that.

Perhaps to rationalize this particular situation, I can replace the
part that auto-reloads every x minutes, with a mechanism that will
reload on demand, but not more than every x minutes. I think that fits
the model better actually, and should be more efficient.

Or is it important to get updates more quickly -- how quickly?

On Wed, Jan 28, 2009 at 7:08 PM, Otis Gospodnetic
<ot...@yahoo.com> wrote:
> Would it be possible to create appropriate abstract methods in appropriate classes (sorry for vagueness, I'm not yet familiar with the code enough to suggest exact places) that would allow dependent classes to list for events triggered/processed by classes they depend on.  Here is a concrete example.
> I like how the FDM re-checks the file and re-loads it.  The Recommender depends on data read by FDM.  So I would hope I could add Recommender as a FDM event listener.  The FDM might have a method such as addListener(FDMListener), abstract beforeReload(...) and abstract afterReload(...).  These methods would be called and implemented by Recommender, which could then clear its cache or do nothing.

Re: Taste: Clearing CachingRecommender

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Would it be possible to create appropriate abstract methods in appropriate classes (sorry for vagueness, I'm not yet familiar with the code enough to suggest exact places) that would allow dependent classes to list for events triggered/processed by classes they depend on.  Here is a concrete example.
I like how the FDM re-checks the file and re-loads it.  The Recommender depends on data read by FDM.  So I would hope I could add Recommender as a FDM event listener.  The FDM might have a method such as addListener(FDMListener), abstract beforeReload(...) and abstract afterReload(...).  These methods would be called and implemented by Recommender, which could then clear its cache or do nothing.

Thoughs so far?  If the above is doable, would it work for other dependency cases in Taste?


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Sean Owen <sr...@gmail.com>
> To: mahout-user@lucene.apache.org
> Sent: Wednesday, January 28, 2009 1:59:48 PM
> Subject: Re: Taste: Clearing CachingRecommender
> 
> Yeah, I have really struggled with this piece. The original vision was
> to support real-time updates -- recommendations always use all the
> latest data. It was quickly clear that was far too slow and wasteful.
> So I bolted on caching... and started putting it everywhere. And it
> quickly became a bit of a mess to manage correctly expiring cache
> entries and so on.
> 
> I ended up, over time, with the reasonable "Refreshable" scheme which
> lets components update their state (and caches) after having
> efficiently had their downstream dependencies update.
> 
> You've found a part where that kind of breaks down. I suppose the idea
> is that updates are driven solely from upstream components. The
> Recommender decides when to update and passes that message down. Under
> that view, the behavior is correct -- the FileDataModel is welcome to
> update itself, and eventually the upstream components will care to
> re-read from it.
> 
> But then perhaps this automatic reloading is pointless. It should
> happen on demand, perhaps with a limit to avoid re-reading updates too
> frequently or something.
> 
> Let me pause and ask for your thoughts?
> 
> On Wed, Jan 28, 2009 at 6:51 PM, Otis Gospodnetic
> wrote:
> > Hi,
> >
> > I'm trying out CachingRecommender to see how much it improves performance for 
> me (it does a lot when the cache is all warmed up).
> > One thing that I assumed would happen automatically is that the 
> CachingRecommender will be refreshed when the underlying file (FileDataModel) is 
> refreshed.  Instead, I do see the FDM re-reading the input file when it detects 
> the change, but my recommendations remain cached forever, as I'm not hitting 
> memory limits.
> >
> > I've been looking at the code, but can't see a clean way to "listen" to the 
> FDM and find out when it finished re-reading the file, so I can clear my 
> CachingRecommender.  Is there an existing way this should be done that I'm just 
> not seeing in the code?
> >
> > Ah, I see another thing now.  Consider this:
> >
> >    Recommender recommender = new BooleanUserGenericUserBasedRecommender(model, 
> hood, similarity);
> >    recommender = new CachingRecommender(recommender);
> >
> > Now if I did somehow find a way to clear the caching recommender, I imagine 
> I'd want to call one of its clear(...) methods, and this requires a cast back to 
> CachingRecommender.  Is there a better way?
> >
> > Thanks,
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> >


Re: Taste: Clearing CachingRecommender

Posted by Sean Owen <sr...@gmail.com>.
Yeah, I have really struggled with this piece. The original vision was
to support real-time updates -- recommendations always use all the
latest data. It was quickly clear that was far too slow and wasteful.
So I bolted on caching... and started putting it everywhere. And it
quickly became a bit of a mess to manage correctly expiring cache
entries and so on.

I ended up, over time, with the reasonable "Refreshable" scheme which
lets components update their state (and caches) after having
efficiently had their downstream dependencies update.

You've found a part where that kind of breaks down. I suppose the idea
is that updates are driven solely from upstream components. The
Recommender decides when to update and passes that message down. Under
that view, the behavior is correct -- the FileDataModel is welcome to
update itself, and eventually the upstream components will care to
re-read from it.

But then perhaps this automatic reloading is pointless. It should
happen on demand, perhaps with a limit to avoid re-reading updates too
frequently or something.

Let me pause and ask for your thoughts?

On Wed, Jan 28, 2009 at 6:51 PM, Otis Gospodnetic
<ot...@yahoo.com> wrote:
> Hi,
>
> I'm trying out CachingRecommender to see how much it improves performance for me (it does a lot when the cache is all warmed up).
> One thing that I assumed would happen automatically is that the CachingRecommender will be refreshed when the underlying file (FileDataModel) is refreshed.  Instead, I do see the FDM re-reading the input file when it detects the change, but my recommendations remain cached forever, as I'm not hitting memory limits.
>
> I've been looking at the code, but can't see a clean way to "listen" to the FDM and find out when it finished re-reading the file, so I can clear my CachingRecommender.  Is there an existing way this should be done that I'm just not seeing in the code?
>
> Ah, I see another thing now.  Consider this:
>
>    Recommender recommender = new BooleanUserGenericUserBasedRecommender(model, hood, similarity);
>    recommender = new CachingRecommender(recommender);
>
> Now if I did somehow find a way to clear the caching recommender, I imagine I'd want to call one of its clear(...) methods, and this requires a cast back to CachingRecommender.  Is there a better way?
>
> Thanks,
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>