You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Saikat Kanjilal <sx...@hotmail.com> on 2012/04/10 17:41:38 UTC

Evalutation of recommenders

Hi everyone,We're looking at building out some clustering and classification algorithms using mahout and one of the things we're also looking at doing is to build performance metrics around each of these algorithms, as we go down the path of choosing the best model in an iterative closed feedback loop (i.e. our business users manipulate weights for each attribute for our feature vectors->we use these changes to regenerate an asynchronous model using the appropriate clustering/classification algorithms and then replenish our online component with this newly recalculated data for fresh recommendations).   So our end goal is to have a basket of algorithms and use a set of performance metrics to pick and choose the right algorithm on the fly.  I was wondering if anyone has done this type of analysis before and if so are there approaches that have worked well and approaches that haven't when it comes to measuring the "quality" of each of the recommendation algorithms.
Regards  		 	   		  

Re: Evalutation of recommenders

Posted by Saikat Kanjilal <sx...@hotmail.com>.
We'll be writing this ourselves :))) the goal is to build the best of breed recommendations engine for our division of the company and then generalize it for the larger company.  Thanks for the heads up.

Sent from my iPhone

On Apr 11, 2012, at 12:31 AM, Manuel Blechschmidt <Ma...@gmx.de> wrote:

> Hi Saikat,
> I wrote my master thesis about evaluating recommender in real world examples:
> 
> https://source.apaxo.de/svn/semrecsys/trunk/doc/2010-Manuel-Blechschmidt-730786-EvalRecSys.pdf
> 
> So what you are  going to do is current research. This means that there are currently not a lot of experiences.
> 
> In 2009 there was an online evaluation challenges which was part of ECML PKDD.
> 
> 2009 ECML PKDD Discovery Challeng: Online Tag Recommendations. http:// www.kde.cs.uni-kassel.de/ws/dc09/online. Version: 2009, Checked: 2011-04-23
> 
> You will have to run all your recommenders in parallel to figure out which one is the best one for optimizing business goals. I founded a company which is developing the described technology. I am currently searching for a project starting in July 2012 where I can try this. So if you are interested in hiring my feel free to send me a personal message.
> 
> /Manuel
> 
> On 10.04.2012, at 17:41, Saikat Kanjilal wrote:
> 
>> 
>> Hi everyone,We're looking at building out some clustering and classification algorithms using mahout and one of the things we're also looking at doing is to build performance metrics around each of these algorithms, as we go down the path of choosing the best model in an iterative closed feedback loop (i.e. our business users manipulate weights for each attribute for our feature vectors->we use these changes to regenerate an asynchronous model using the appropriate clustering/classification algorithms and then replenish our online component with this newly recalculated data for fresh recommendations).   So our end goal is to have a basket of algorithms and use a set of performance metrics to pick and choose the right algorithm on the fly.  I was wondering if anyone has done this type of analysis before and if so are there approaches that have worked well and approaches that haven't when it comes to measuring the "quality" of each of the recommendation algorithms.
>> Regards                           
> 
> -- 
> Manuel Blechschmidt
> CTO - Apaxo GmbH
> blechschmidt@apaxo.de
> http://www.apaxo.de
> 
> Weinbergstr. 16
> 14469 Potsdam
> 
> Telefon +49 (0)6204 9180 593
> Fax +49 (0)6204 9180 594
> Mobil: +49 173/6322621
> 
> Skype: Manuel_B86
> Twitter: http://twitter.com/Manuel_B
> 
> Sitz der Gesellschaft: Viernheim
> Handelsregister HRB 87159
> Ust-IdNr. DE261368874
> Amtsgericht Darmstadt
> Geschäftsführer Friedhelm Scharhag
> 
> 

Re: Evalutation of recommenders

Posted by Manuel Blechschmidt <Ma...@gmx.de>.
Hi Saikat,
I wrote my master thesis about evaluating recommender in real world examples:

https://source.apaxo.de/svn/semrecsys/trunk/doc/2010-Manuel-Blechschmidt-730786-EvalRecSys.pdf

So what you are  going to do is current research. This means that there are currently not a lot of experiences.

In 2009 there was an online evaluation challenges which was part of ECML PKDD.

2009 ECML PKDD Discovery Challeng: Online Tag Recommendations. http:// www.kde.cs.uni-kassel.de/ws/dc09/online. Version: 2009, Checked: 2011-04-23

You will have to run all your recommenders in parallel to figure out which one is the best one for optimizing business goals. I founded a company which is developing the described technology. I am currently searching for a project starting in July 2012 where I can try this. So if you are interested in hiring my feel free to send me a personal message.

/Manuel

On 10.04.2012, at 17:41, Saikat Kanjilal wrote:

> 
> Hi everyone,We're looking at building out some clustering and classification algorithms using mahout and one of the things we're also looking at doing is to build performance metrics around each of these algorithms, as we go down the path of choosing the best model in an iterative closed feedback loop (i.e. our business users manipulate weights for each attribute for our feature vectors->we use these changes to regenerate an asynchronous model using the appropriate clustering/classification algorithms and then replenish our online component with this newly recalculated data for fresh recommendations).   So our end goal is to have a basket of algorithms and use a set of performance metrics to pick and choose the right algorithm on the fly.  I was wondering if anyone has done this type of analysis before and if so are there approaches that have worked well and approaches that haven't when it comes to measuring the "quality" of each of the recommendation algorithms.
> Regards  		 	   		  

-- 
Manuel Blechschmidt
CTO - Apaxo GmbH
blechschmidt@apaxo.de
http://www.apaxo.de

Weinbergstr. 16
14469 Potsdam

Telefon +49 (0)6204 9180 593
Fax +49 (0)6204 9180 594
Mobil: +49 173/6322621

Skype: Manuel_B86
Twitter: http://twitter.com/Manuel_B

Sitz der Gesellschaft: Viernheim
Handelsregister HRB 87159
Ust-IdNr. DE261368874
Amtsgericht Darmstadt
Geschäftsführer Friedhelm Scharhag


RE: Evalutation of recommenders

Posted by Saikat Kanjilal <sx...@hotmail.com>.
Yes we have business users who are putting measures on a real world metric and in turn provide that level of feedback by putting some weighting on some algorithm parameters to tweak results, the results should be different and will be driven off from this.
Thanks again for your insight on recommender metrics, will look at implementing these, will post more as we get this off the ground as we run into challenging scenarios.

> Date: Tue, 10 Apr 2012 16:34:33 -0500
> Subject: Re: Evalutation of recommenders
> From: srowen@gmail.com
> To: user@mahout.apache.org
> 
> You are making recommendations, and you want to do this via
> clustering. OK, that's fine. How you implement it isn't so important
> -- it's that you have some parameters to change and want to know how
> any given process does.
> 
> You just want to use some standard recommender metrics, to start, I'd
> imagine. If you're estimating ratings -- root mean squared error of
> the difference between estimate and actual on the training data. Or
> you can fall back to precision, recall, and nDCG as a form of score.
> So, yes, definitely well-established approaches here.
> 
> I have this sense that you are saying you have business users who are
> going to measure some real-world metric (conversion rate, uplift,
> clickthrough), and guess at some changes to algorithm parameters that
> might make them better. If you have *that* kind of feedback -- much
> better. That is a far more realistic metric. Of course, it's much
> harder to experiment when using that metric since you have to run the
> algo for a day or something to collect data.
> 
> It's a separate question, but I don't know if in the end a business
> user can meaningfully decide weights on feature vectors. I mean, I
> couldn't eyeball those kinds of things. It may just be how you need to
> do things, but would double-check that everyone has a similar and
> reasonable expectation about what these inputs are and what they do.
> 
> 
> On Tue, Apr 10, 2012 at 3:23 PM, Saikat Kanjilal <sx...@hotmail.com> wrote:
> >
> > I see the architecture similar to  the following:
> >
> > Asynchronously:Given a set of feature vectors    run clustering/classification algorithms for each of our feature vectors to create the appropriate buckets for the set of users, feed the result of these computations into the synchronous database.
> > Synchronously:For each bucket run item similarity recommendation algorithms to display a real time set of recommendations for each user
> >
> > For the asynchronous computations we need the ability to tweak the weights associated with each feature of the feature vectors (typical features might include income/age/dining preferences etc) and we need the business folks to adjust the weights associated with each of these to regenerate the async buckets
> >
> > So given the above architecture we need the ability for the async computations to judge which algorithm to use based on a set of performance measuring criteria, that was the heart of my initial question, whether folks have built this sort of framework and what are some things to think about when building this.
> > Thanks for your feedback
> >
> >
> >
> >> Date: Tue, 10 Apr 2012 14:33:56 -0500
> >> Subject: Re: Evalutation of recommenders
> >> From: srowen@gmail.com
> >> To: user@mahout.apache.org
> >>
> >> You're talking about recommendations now... are we talking about a
> >> clustering, classification or recommender system?
> >>
> >> In general I don't know if it makes sense for business users to be
> >> deciding aspects of the internal model. At most someone should input
> >> the tradeoffs -- how important is accuracy vs speed? those kinds of
> >> things. Then it's an optimization problem. But, understood, maybe you
> >> need to let people explore these things manually at first.
> >>
> >> On Tue, Apr 10, 2012 at 2:21 PM, Saikat Kanjilal <sx...@hotmail.com> wrote:
> >> >
> >> > The question really is what are some tried approaches to figure out how to measure the quality of a set of algorithms currently being used for clustering/classification?
> >> >
> >> > And in thinking about this some more we also need to be able to regenerate models as soon as the business users tweak the weights associated with features inside a feature vector, we need to figure out a way to efficiently tie this into our online workflow which could show updated recommendations every few hours?
> >> >
> >> > When I say picking an algorithm on the fly what I mean is that we need to continuously test our basket of algorithms based on a new set of training data and make the determination offline as to which of the algorithms to use at that moment to regenerate our recommendations.
> >> >> Date: Tue, 10 Apr 2012 14:08:17 -0500
> >> >> Subject: Re: Evalutation of recommenders
> >> >> From: srowen@gmail.com
> >> >> To: user@mahout.apache.org
> >> >>
> >> >> Picking an algorithm 'on the fly' is almost surely not realistic --
> >> >> well, I am not sure what eval process you would run in milliseconds.
> >> >> But it's also unnecessary; you usually run evaluations offline on
> >> >> training/test data that reflects real input, and then, the resulting
> >> >> tuning should be fine for that real input that comes the next day.
> >> >>
> >> >> Is that really the question, or are you just asking about how you
> >> >> measure the quality of clustering or a classifier?
> >> >>
> >> >> On Tue, Apr 10, 2012 at 10:41 AM, Saikat Kanjilal <sx...@hotmail.com> wrote:
> >> >> >
> >> >> > Hi everyone,We're looking at building out some clustering and classification algorithms using mahout and one of the things we're also looking at doing is to build performance metrics around each of these algorithms, as we go down the path of choosing the best model in an iterative closed feedback loop (i.e. our business users manipulate weights for each attribute for our feature vectors->we use these changes to regenerate an asynchronous model using the appropriate clustering/classification algorithms and then replenish our online component with this newly recalculated data for fresh recommendations).   So our end goal is to have a basket of algorithms and use a set of performance metrics to pick and choose the right algorithm on the fly.  I was wondering if anyone has done this type of analysis before and if so are there approaches that have worked well and approaches that haven't when it comes to measuring the "quality" of each of the recommendation algorithms.
> >> >> > Regards
> >> >
> >
 		 	   		  

Re: Evalutation of recommenders

Posted by Sean Owen <sr...@gmail.com>.
You are making recommendations, and you want to do this via
clustering. OK, that's fine. How you implement it isn't so important
-- it's that you have some parameters to change and want to know how
any given process does.

You just want to use some standard recommender metrics, to start, I'd
imagine. If you're estimating ratings -- root mean squared error of
the difference between estimate and actual on the training data. Or
you can fall back to precision, recall, and nDCG as a form of score.
So, yes, definitely well-established approaches here.

I have this sense that you are saying you have business users who are
going to measure some real-world metric (conversion rate, uplift,
clickthrough), and guess at some changes to algorithm parameters that
might make them better. If you have *that* kind of feedback -- much
better. That is a far more realistic metric. Of course, it's much
harder to experiment when using that metric since you have to run the
algo for a day or something to collect data.

It's a separate question, but I don't know if in the end a business
user can meaningfully decide weights on feature vectors. I mean, I
couldn't eyeball those kinds of things. It may just be how you need to
do things, but would double-check that everyone has a similar and
reasonable expectation about what these inputs are and what they do.


On Tue, Apr 10, 2012 at 3:23 PM, Saikat Kanjilal <sx...@hotmail.com> wrote:
>
> I see the architecture similar to  the following:
>
> Asynchronously:Given a set of feature vectors    run clustering/classification algorithms for each of our feature vectors to create the appropriate buckets for the set of users, feed the result of these computations into the synchronous database.
> Synchronously:For each bucket run item similarity recommendation algorithms to display a real time set of recommendations for each user
>
> For the asynchronous computations we need the ability to tweak the weights associated with each feature of the feature vectors (typical features might include income/age/dining preferences etc) and we need the business folks to adjust the weights associated with each of these to regenerate the async buckets
>
> So given the above architecture we need the ability for the async computations to judge which algorithm to use based on a set of performance measuring criteria, that was the heart of my initial question, whether folks have built this sort of framework and what are some things to think about when building this.
> Thanks for your feedback
>
>
>
>> Date: Tue, 10 Apr 2012 14:33:56 -0500
>> Subject: Re: Evalutation of recommenders
>> From: srowen@gmail.com
>> To: user@mahout.apache.org
>>
>> You're talking about recommendations now... are we talking about a
>> clustering, classification or recommender system?
>>
>> In general I don't know if it makes sense for business users to be
>> deciding aspects of the internal model. At most someone should input
>> the tradeoffs -- how important is accuracy vs speed? those kinds of
>> things. Then it's an optimization problem. But, understood, maybe you
>> need to let people explore these things manually at first.
>>
>> On Tue, Apr 10, 2012 at 2:21 PM, Saikat Kanjilal <sx...@hotmail.com> wrote:
>> >
>> > The question really is what are some tried approaches to figure out how to measure the quality of a set of algorithms currently being used for clustering/classification?
>> >
>> > And in thinking about this some more we also need to be able to regenerate models as soon as the business users tweak the weights associated with features inside a feature vector, we need to figure out a way to efficiently tie this into our online workflow which could show updated recommendations every few hours?
>> >
>> > When I say picking an algorithm on the fly what I mean is that we need to continuously test our basket of algorithms based on a new set of training data and make the determination offline as to which of the algorithms to use at that moment to regenerate our recommendations.
>> >> Date: Tue, 10 Apr 2012 14:08:17 -0500
>> >> Subject: Re: Evalutation of recommenders
>> >> From: srowen@gmail.com
>> >> To: user@mahout.apache.org
>> >>
>> >> Picking an algorithm 'on the fly' is almost surely not realistic --
>> >> well, I am not sure what eval process you would run in milliseconds.
>> >> But it's also unnecessary; you usually run evaluations offline on
>> >> training/test data that reflects real input, and then, the resulting
>> >> tuning should be fine for that real input that comes the next day.
>> >>
>> >> Is that really the question, or are you just asking about how you
>> >> measure the quality of clustering or a classifier?
>> >>
>> >> On Tue, Apr 10, 2012 at 10:41 AM, Saikat Kanjilal <sx...@hotmail.com> wrote:
>> >> >
>> >> > Hi everyone,We're looking at building out some clustering and classification algorithms using mahout and one of the things we're also looking at doing is to build performance metrics around each of these algorithms, as we go down the path of choosing the best model in an iterative closed feedback loop (i.e. our business users manipulate weights for each attribute for our feature vectors->we use these changes to regenerate an asynchronous model using the appropriate clustering/classification algorithms and then replenish our online component with this newly recalculated data for fresh recommendations).   So our end goal is to have a basket of algorithms and use a set of performance metrics to pick and choose the right algorithm on the fly.  I was wondering if anyone has done this type of analysis before and if so are there approaches that have worked well and approaches that haven't when it comes to measuring the "quality" of each of the recommendation algorithms.
>> >> > Regards
>> >
>

RE: Evalutation of recommenders

Posted by Saikat Kanjilal <sx...@hotmail.com>.
I see the architecture similar to  the following:

Asynchronously:Given a set of feature vectors    run clustering/classification algorithms for each of our feature vectors to create the appropriate buckets for the set of users, feed the result of these computations into the synchronous database.    
Synchronously:For each bucket run item similarity recommendation algorithms to display a real time set of recommendations for each user

For the asynchronous computations we need the ability to tweak the weights associated with each feature of the feature vectors (typical features might include income/age/dining preferences etc) and we need the business folks to adjust the weights associated with each of these to regenerate the async buckets

So given the above architecture we need the ability for the async computations to judge which algorithm to use based on a set of performance measuring criteria, that was the heart of my initial question, whether folks have built this sort of framework and what are some things to think about when building this.
Thanks for your feedback



> Date: Tue, 10 Apr 2012 14:33:56 -0500
> Subject: Re: Evalutation of recommenders
> From: srowen@gmail.com
> To: user@mahout.apache.org
> 
> You're talking about recommendations now... are we talking about a
> clustering, classification or recommender system?
> 
> In general I don't know if it makes sense for business users to be
> deciding aspects of the internal model. At most someone should input
> the tradeoffs -- how important is accuracy vs speed? those kinds of
> things. Then it's an optimization problem. But, understood, maybe you
> need to let people explore these things manually at first.
> 
> On Tue, Apr 10, 2012 at 2:21 PM, Saikat Kanjilal <sx...@hotmail.com> wrote:
> >
> > The question really is what are some tried approaches to figure out how to measure the quality of a set of algorithms currently being used for clustering/classification?
> >
> > And in thinking about this some more we also need to be able to regenerate models as soon as the business users tweak the weights associated with features inside a feature vector, we need to figure out a way to efficiently tie this into our online workflow which could show updated recommendations every few hours?
> >
> > When I say picking an algorithm on the fly what I mean is that we need to continuously test our basket of algorithms based on a new set of training data and make the determination offline as to which of the algorithms to use at that moment to regenerate our recommendations.
> >> Date: Tue, 10 Apr 2012 14:08:17 -0500
> >> Subject: Re: Evalutation of recommenders
> >> From: srowen@gmail.com
> >> To: user@mahout.apache.org
> >>
> >> Picking an algorithm 'on the fly' is almost surely not realistic --
> >> well, I am not sure what eval process you would run in milliseconds.
> >> But it's also unnecessary; you usually run evaluations offline on
> >> training/test data that reflects real input, and then, the resulting
> >> tuning should be fine for that real input that comes the next day.
> >>
> >> Is that really the question, or are you just asking about how you
> >> measure the quality of clustering or a classifier?
> >>
> >> On Tue, Apr 10, 2012 at 10:41 AM, Saikat Kanjilal <sx...@hotmail.com> wrote:
> >> >
> >> > Hi everyone,We're looking at building out some clustering and classification algorithms using mahout and one of the things we're also looking at doing is to build performance metrics around each of these algorithms, as we go down the path of choosing the best model in an iterative closed feedback loop (i.e. our business users manipulate weights for each attribute for our feature vectors->we use these changes to regenerate an asynchronous model using the appropriate clustering/classification algorithms and then replenish our online component with this newly recalculated data for fresh recommendations).   So our end goal is to have a basket of algorithms and use a set of performance metrics to pick and choose the right algorithm on the fly.  I was wondering if anyone has done this type of analysis before and if so are there approaches that have worked well and approaches that haven't when it comes to measuring the "quality" of each of the recommendation algorithms.
> >> > Regards
> >
 		 	   		  

Re: Evalutation of recommenders

Posted by Sean Owen <sr...@gmail.com>.
You're talking about recommendations now... are we talking about a
clustering, classification or recommender system?

In general I don't know if it makes sense for business users to be
deciding aspects of the internal model. At most someone should input
the tradeoffs -- how important is accuracy vs speed? those kinds of
things. Then it's an optimization problem. But, understood, maybe you
need to let people explore these things manually at first.

On Tue, Apr 10, 2012 at 2:21 PM, Saikat Kanjilal <sx...@hotmail.com> wrote:
>
> The question really is what are some tried approaches to figure out how to measure the quality of a set of algorithms currently being used for clustering/classification?
>
> And in thinking about this some more we also need to be able to regenerate models as soon as the business users tweak the weights associated with features inside a feature vector, we need to figure out a way to efficiently tie this into our online workflow which could show updated recommendations every few hours?
>
> When I say picking an algorithm on the fly what I mean is that we need to continuously test our basket of algorithms based on a new set of training data and make the determination offline as to which of the algorithms to use at that moment to regenerate our recommendations.
>> Date: Tue, 10 Apr 2012 14:08:17 -0500
>> Subject: Re: Evalutation of recommenders
>> From: srowen@gmail.com
>> To: user@mahout.apache.org
>>
>> Picking an algorithm 'on the fly' is almost surely not realistic --
>> well, I am not sure what eval process you would run in milliseconds.
>> But it's also unnecessary; you usually run evaluations offline on
>> training/test data that reflects real input, and then, the resulting
>> tuning should be fine for that real input that comes the next day.
>>
>> Is that really the question, or are you just asking about how you
>> measure the quality of clustering or a classifier?
>>
>> On Tue, Apr 10, 2012 at 10:41 AM, Saikat Kanjilal <sx...@hotmail.com> wrote:
>> >
>> > Hi everyone,We're looking at building out some clustering and classification algorithms using mahout and one of the things we're also looking at doing is to build performance metrics around each of these algorithms, as we go down the path of choosing the best model in an iterative closed feedback loop (i.e. our business users manipulate weights for each attribute for our feature vectors->we use these changes to regenerate an asynchronous model using the appropriate clustering/classification algorithms and then replenish our online component with this newly recalculated data for fresh recommendations).   So our end goal is to have a basket of algorithms and use a set of performance metrics to pick and choose the right algorithm on the fly.  I was wondering if anyone has done this type of analysis before and if so are there approaches that have worked well and approaches that haven't when it comes to measuring the "quality" of each of the recommendation algorithms.
>> > Regards
>

RE: Evalutation of recommenders

Posted by Saikat Kanjilal <sx...@hotmail.com>.
The question really is what are some tried approaches to figure out how to measure the quality of a set of algorithms currently being used for clustering/classification?

And in thinking about this some more we also need to be able to regenerate models as soon as the business users tweak the weights associated with features inside a feature vector, we need to figure out a way to efficiently tie this into our online workflow which could show updated recommendations every few hours?

When I say picking an algorithm on the fly what I mean is that we need to continuously test our basket of algorithms based on a new set of training data and make the determination offline as to which of the algorithms to use at that moment to regenerate our recommendations.
> Date: Tue, 10 Apr 2012 14:08:17 -0500
> Subject: Re: Evalutation of recommenders
> From: srowen@gmail.com
> To: user@mahout.apache.org
> 
> Picking an algorithm 'on the fly' is almost surely not realistic --
> well, I am not sure what eval process you would run in milliseconds.
> But it's also unnecessary; you usually run evaluations offline on
> training/test data that reflects real input, and then, the resulting
> tuning should be fine for that real input that comes the next day.
> 
> Is that really the question, or are you just asking about how you
> measure the quality of clustering or a classifier?
> 
> On Tue, Apr 10, 2012 at 10:41 AM, Saikat Kanjilal <sx...@hotmail.com> wrote:
> >
> > Hi everyone,We're looking at building out some clustering and classification algorithms using mahout and one of the things we're also looking at doing is to build performance metrics around each of these algorithms, as we go down the path of choosing the best model in an iterative closed feedback loop (i.e. our business users manipulate weights for each attribute for our feature vectors->we use these changes to regenerate an asynchronous model using the appropriate clustering/classification algorithms and then replenish our online component with this newly recalculated data for fresh recommendations).   So our end goal is to have a basket of algorithms and use a set of performance metrics to pick and choose the right algorithm on the fly.  I was wondering if anyone has done this type of analysis before and if so are there approaches that have worked well and approaches that haven't when it comes to measuring the "quality" of each of the recommendation algorithms.
> > Regards
 		 	   		  

Re: Evalutation of recommenders

Posted by Sean Owen <sr...@gmail.com>.
Picking an algorithm 'on the fly' is almost surely not realistic --
well, I am not sure what eval process you would run in milliseconds.
But it's also unnecessary; you usually run evaluations offline on
training/test data that reflects real input, and then, the resulting
tuning should be fine for that real input that comes the next day.

Is that really the question, or are you just asking about how you
measure the quality of clustering or a classifier?

On Tue, Apr 10, 2012 at 10:41 AM, Saikat Kanjilal <sx...@hotmail.com> wrote:
>
> Hi everyone,We're looking at building out some clustering and classification algorithms using mahout and one of the things we're also looking at doing is to build performance metrics around each of these algorithms, as we go down the path of choosing the best model in an iterative closed feedback loop (i.e. our business users manipulate weights for each attribute for our feature vectors->we use these changes to regenerate an asynchronous model using the appropriate clustering/classification algorithms and then replenish our online component with this newly recalculated data for fresh recommendations).   So our end goal is to have a basket of algorithms and use a set of performance metrics to pick and choose the right algorithm on the fly.  I was wondering if anyone has done this type of analysis before and if so are there approaches that have worked well and approaches that haven't when it comes to measuring the "quality" of each of the recommendation algorithms.
> Regards