You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Pat Ferrel <pa...@occamsmachete.com> on 2014/04/06 19:26:28 UTC

Solr+Mahout Recommender Demo Site

After having integrated several versions of the Mahout and Myrrix recommenders at fairly large scale. I was interested in solving three problems that these did not directly provide for:
1) realtime queries for recs using data not yet incorporated into the training set. Myrrix allows this but Mahout using the hadoop mr version does not.
2) cross-recommendations from two or more action types (say purchase and detail-view)
3) blending metadata and user preference data to return recs (for example category & user preferences => recs)

Using Solr + Mahout provided an amazingly flexible and performant way to do this. Ted wrote about his experience with this basic approach in his recent book. Take user preferences, run them through RowSimilarityJob and you get an item by item similarity Matrix. This is the core of an item-based cooccurrence recommender. If you take the similarity matrix, and convert it into a list of tokens per row, you have something Solr can index. If you then use a user’s history as a query on the indexed data you get an ordered list of recommendations.

When I set out to do #1 and #3 the need for CF data AND metadata was the first problem. So I mined the web for video reviews and video metadata. Then logging any users who visit the site will lead to data for #2 and #1.

The demo site is https://guide.finderbots.com and instructions are at the end of this for anyone who would like to test it out. As a crude user test there is a procedure we ask you to follow to help gather quality of recommendations data. It’s running out of my closet over Comcast so if it’s down I may have tripped over a cord, sorry try again later.

There are a bunch of different methods for making recs illustrated on the site. One method that illustrates blending metadata uses preference data from you, and metadata to bias and filter recs. Imagine that you have trained the system with your preferences by making some video picks. Now imagine you’d like to get recommendations for Comedies from Neflix based on your previous video preferences. This is done with a single Solr query on indexed video fields that hold genre, similar videos (from the similarity matrix), and sources. The query finds similar videos to the ones you have liked, with the genre “Comedy” boosted by some amount, but only those that have at least one source = “Netflix”. 

I’ll be doing some blog posts covering the specifics of how each rec type is done, the site and DB architecture, and Solr setup.

The project uses the Solr recommender prep code here: https://github.com/pferrel/solr-recommender

BTW I plan to publish obfuscated usage data in the github repo.

begin form letter =======================================

Please use a very newly updated browser (latest Firefox, Chrome, Safari, and nothing older than IE10) the site doesn’t yet check browser compatibility but relies on HTML5 and CSS3 rather heavily.

1) go to https://guide.finderbots.com/users/sign_up to create an account
2) go to https://guide.finderbots.com/trainers to ’train' the recommender hit thumbs up on videos you like. There are 20 pages of training videos, you can leave at any time but if you can go through them all it would be appreciated.
3) go to https://guide.finderbots.com/guides/recommend to immediately get personalized recs from your training data. If you completed the trainer check the top line of recs, count how many are videos you liked or would like to see. Scroll right or left to see a total of 24 in four batches of 6. If you could report to me the total you thought were good recs it would be greatly appreciated. 
4) browse videos by various criteria here: https://guide.finderbots.com/guides These are not recommendations, they are simply a catalog.
5) control how you browse videos by clicking the gears icon. You can set all videos to be from one or more sources here. If you choose Netflix alone (don’t forget to uncheck ‘all’) then recs and browsed videos will all be available on Netflix.

Re: Solr+Mahout Recommender Demo Site

Posted by Pat Ferrel <pa...@occamsmachete.com>.

Exactly so. Recording different types of input was one reason to build a site that looked good enough to feasibly get a little traffic.

For instance, one of the recommendation types is “Based on videos you recently viewed” You see these on detail pages as well as the recommend page. These detail pages views are put in log files but also recorded in realtime for queries. The recently viewed videos are used as a query on the similarity/indicator matrix. This yields somewhat weak results IMO, partly because of the mismatch in actions. You are trying to recommend from the “liked videos” indicator matrix using “viewed detail page” actions. This is a case for the cross-recommender but I don’t have enough detail views yet to calculate a cross-indicator matrix so I make do with the one I have.

The analogy of a shopping cart recommender might be the watchlist on the site. A user’s watchlist indicates an item-set that interests the user. Once enough of these are collected Solr will quite easily allow for queries against everyone’s watchlists using the user’s watchlist as the query. Not as strong as buying things together in a shopping cart but still may be of value. When you go to your watchlist page (not really implmented yet) you’d see other videos from similar watchlists. This type of query could be combined with the watchlist as a query on the liked-video indicator matrix to give better results. 

Other actions are also possible to use like search terms. An indicator matrix of search terms and videos clicked could be blended with fultext search to get personalized search results—again given enough usage.

The site is equipped to gather all of this data if there is enough traffic.

On Apr 6, 2014, at 10:33 PM, Ted Dunning <te...@gmail.com> wrote:

On Mon, Apr 7, 2014 at 5:18 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> Combining this kind of metadata with CF data has been important to the big
> guys but elusive to the rest of us. And a recommender that seamlessly
> integrates the different methods is rare. Solr + Mahout does it better than
> anything I’ve seen on the OSS or pay software market.
> 

Combining with meta-data is a huge deal.

Frankly, having many kinds of indicators in the index so that you can mix
and match is big as well (maybe half as big).  This lets you tune the
weight of different kinds of input.

Re: Solr+Mahout Recommender Demo Site

Posted by Ted Dunning <te...@gmail.com>.

On Mon, Apr 7, 2014 at 5:18 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> Combining this kind of metadata with CF data has been important to the big
> guys but elusive to the rest of us. And a recommender that seamlessly
> integrates the different methods is rare. Solr + Mahout does it better than
> anything I’ve seen on the OSS or pay software market.
>

Combining with meta-data is a huge deal.

Frankly, having many kinds of indicators in the index so that you can mix
and match is big as well (maybe half as big).  This lets you tune the
weight of different kinds of input.

Re: Solr+Mahout Recommender Demo Site

Posted by Pat Ferrel <pa...@occamsmachete.com>.

BTW this isn’t an attempt to show off, it’s an attempt to start a conversation about fast scalable hybrid recommendations—content-based + collaborative filtering recommenders. 

Anyone who has started a business that uses a recommender has had to deal with the ‘cold-start’ problem. No preference data for CF. Usually the answer to this is a content based recommender. Find similar items based on metadata or content and you can attempt to recommend even without usage data. 

But even later when you have CF data, metadata or content can play an important part. All the big guys place a great deal of weight on this. Netlfix went as far as to create thousands of micro-genres for videos and the music genome project does similar for music. 

Combining this kind of metadata with CF data has been important to the big guys but elusive to the rest of us. And a recommender that seamlessly integrates the different methods is rare. Solr + Mahout does it better than anything I’ve seen on the OSS or pay software market. 

On Apr 6, 2014, at 5:04 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

As I said below RSJ is actually all that is needed. But with the entire recommender also integrated we can compare the two in the demo framework. For instance one of the lines of recs on a video detail page (the top one) is the actual RSJ output. When I get time, the recommend page will have a line of precalculated recs from the Mahout item recommender since those are already being generated. It will be interesting to see them side by side, could even form an A/B test around that if there were any traffic.

One thing I’ve noticed is that Solr recs are so much more flexible, especially when blended with metadata I can’t imagine wanting to go back to the old way. Even if the mahout precalculated recs were marginally better, the Solr method allows you to fill pages with recs biased in different ways. It’s almost like turning the catalog browser into one customized by the user’s preferences.

BTW dithering and anti-repeat/anti-flood are implemented on the recommend page. Dithering is done with varying lambdas, very high values are used on lists that change seldom, like “Recently Popular”.

On Apr 6, 2014, at 4:28 PM, Ted Dunning <te...@gmail.com> wrote:

This can actually be simplified a bit by using ItemSimilarityJob to call
RowSimilarityJob.

Nice work overall.

On Sun, Apr 6, 2014 at 10:21 PM, Andrew Musselman <
andrew.musselman@gmail.com> wrote:

> Pat, do you still want help putting this into a new mahout/examples, or
> work out how to do the distribution via "github pointer"?  There's an open
> bug for that.
> 
>> On Apr 6, 2014, at 1:13 PM, Sebastian Schelter <ss...@apache.org> wrote:
>> 
>> The top 3 recommendations "based on videos you liked" are very good!
>> 
>> Nice job.
>> 
>> 
>>> On 04/06/2014 07:26 PM, Pat Ferrel wrote:
>>> After having integrated several versions of the Mahout and Myrrix
> recommenders at fairly large scale. I was interested in solving three
> problems that these did not directly provide for:
>>> 1) realtime queries for recs using data not yet incorporated into the
> training set. Myrrix allows this but Mahout using the hadoop mr version
> does not.
>>> 2) cross-recommendations from two or more action types (say purchase
> and detail-view)
>>> 3) blending metadata and user preference data to return recs (for
> example category & user preferences => recs)
>>> 
>>> Using Solr + Mahout provided an amazingly flexible and performant way
> to do this. Ted wrote about his experience with this basic approach in his
> recent book. Take user preferences, run them through RowSimilarityJob and
> you get an item by item similarity Matrix. This is the core of an
> item-based cooccurrence recommender. If you take the similarity matrix, and
> convert it into a list of tokens per row, you have something Solr can
> index. If you then use a user’s history as a query on the indexed data you
> get an ordered list of recommendations.
>>> 
>>> When I set out to do #1 and #3 the need for CF data AND metadata was
> the first problem. So I mined the web for video reviews and video metadata.
> Then logging any users who visit the site will lead to data for #2 and #1.
>>> 
>>> The demo site is https://guide.finderbots.com and instructions are at
> the end of this for anyone who would like to test it out. As a crude user
> test there is a procedure we ask you to follow to help gather quality of
> recommendations data. It’s running out of my closet over Comcast so if it’s
> down I may have tripped over a cord, sorry try again later.
>>> 
>>> There are a bunch of different methods for making recs illustrated on
> the site. One method that illustrates blending metadata uses preference
> data from you, and metadata to bias and filter recs. Imagine that you have
> trained the system with your preferences by making some video picks. Now
> imagine you’d like to get recommendations for Comedies from Neflix based on
> your previous video preferences. This is done with a single Solr query on
> indexed video fields that hold genre, similar videos (from the similarity
> matrix), and sources. The query finds similar videos to the ones you have
> liked, with the genre “Comedy” boosted by some amount, but only those that
> have at least one source = “Netflix”.
>>> 
>>> I’ll be doing some blog posts covering the specifics of how each rec
> type is done, the site and DB architecture, and Solr setup.
>>> 
>>> The project uses the Solr recommender prep code here:
> https://github.com/pferrel/solr-recommender
>>> 
>>> BTW I plan to publish obfuscated usage data in the github repo.
>>> 
>>> begin form letter =======================================
>>> 
>>> Please use a very newly updated browser (latest Firefox, Chrome,
> Safari, and nothing older than IE10) the site doesn’t yet check browser
> compatibility but relies on HTML5 and CSS3 rather heavily.
>>> 
>>> 1) go to https://guide.finderbots.com/users/sign_up to create an
> account
>>> 2) go to https://guide.finderbots.com/trainers to ’train' the
> recommender hit thumbs up on videos you like. There are 20 pages of
> training videos, you can leave at any time but if you can go through them
> all it would be appreciated.
>>> 3) go to https://guide.finderbots.com/guides/recommend to immediately
> get personalized recs from your training data. If you completed the trainer
> check the top line of recs, count how many are videos you liked or would
> like to see. Scroll right or left to see a total of 24 in four batches of
> 6. If you could report to me the total you thought were good recs it would
> be greatly appreciated.
>>> 4) browse videos by various criteria here:
> https://guide.finderbots.com/guides These are not recommendations, they
> are simply a catalog.
>>> 5) control how you browse videos by clicking the gears icon. You can
> set all videos to be from one or more sources here. If you choose Netflix
> alone (don’t forget to uncheck ‘all’) then recs and browsed videos will all
> be available on Netflix.
>> 
>

Re: Solr+Mahout Recommender Demo Site

Posted by Ted Dunning <te...@gmail.com>.

On Mon, Apr 7, 2014 at 2:04 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> As I said below RSJ is actually all that is needed. But with the entire
> recommender also integrated we can compare the two in the demo framework.
> For instance one of the lines of recs on a video detail page (the top one)
> is the actual RSJ output. When I get time, the recommend page will have a
> line of precalculated recs from the Mahout item recommender since those are
> already being generated. It will be interesting to see them side by side,
> could even form an A/B test around that if there were any traffic.
>

ItemSimilarityJob is actually just a wrapper around RowSimilarityJob that
also computes some similarities in the end (which I always ignore).  The
nice thing is that it does most of the format conversions for us.

If you ignore the weights that come out the far end, ISJ produce identical
results as RSJ.

> ....
> BTW dithering and anti-repeat/anti-flood are implemented on the recommend
> page. Dithering is done with varying lambdas, very high values are used on
> lists that change seldom, like “Recently Popular”.
>

Exactly the right practice.

Re: Solr+Mahout Recommender Demo Site

Posted by Pat Ferrel <pa...@occamsmachete.com>.

As I said below RSJ is actually all that is needed. But with the entire recommender also integrated we can compare the two in the demo framework. For instance one of the lines of recs on a video detail page (the top one) is the actual RSJ output. When I get time, the recommend page will have a line of precalculated recs from the Mahout item recommender since those are already being generated. It will be interesting to see them side by side, could even form an A/B test around that if there were any traffic.

One thing I’ve noticed is that Solr recs are so much more flexible, especially when blended with metadata I can’t imagine wanting to go back to the old way. Even if the mahout precalculated recs were marginally better, the Solr method allows you to fill pages with recs biased in different ways. It’s almost like turning the catalog browser into one customized by the user’s preferences.

BTW dithering and anti-repeat/anti-flood are implemented on the recommend page. Dithering is done with varying lambdas, very high values are used on lists that change seldom, like “Recently Popular”.


On Apr 6, 2014, at 4:28 PM, Ted Dunning <te...@gmail.com> wrote:

This can actually be simplified a bit by using ItemSimilarityJob to call
RowSimilarityJob.

Nice work overall.


On Sun, Apr 6, 2014 at 10:21 PM, Andrew Musselman <
andrew.musselman@gmail.com> wrote:

> Pat, do you still want help putting this into a new mahout/examples, or
> work out how to do the distribution via "github pointer"?  There's an open
> bug for that.
> 
>> On Apr 6, 2014, at 1:13 PM, Sebastian Schelter <ss...@apache.org> wrote:
>> 
>> The top 3 recommendations "based on videos you liked" are very good!
>> 
>> Nice job.
>> 
>> 
>>> On 04/06/2014 07:26 PM, Pat Ferrel wrote:
>>> After having integrated several versions of the Mahout and Myrrix
> recommenders at fairly large scale. I was interested in solving three
> problems that these did not directly provide for:
>>> 1) realtime queries for recs using data not yet incorporated into the
> training set. Myrrix allows this but Mahout using the hadoop mr version
> does not.
>>> 2) cross-recommendations from two or more action types (say purchase
> and detail-view)
>>> 3) blending metadata and user preference data to return recs (for
> example category & user preferences => recs)
>>> 
>>> Using Solr + Mahout provided an amazingly flexible and performant way
> to do this. Ted wrote about his experience with this basic approach in his
> recent book. Take user preferences, run them through RowSimilarityJob and
> you get an item by item similarity Matrix. This is the core of an
> item-based cooccurrence recommender. If you take the similarity matrix, and
> convert it into a list of tokens per row, you have something Solr can
> index. If you then use a user’s history as a query on the indexed data you
> get an ordered list of recommendations.
>>> 
>>> When I set out to do #1 and #3 the need for CF data AND metadata was
> the first problem. So I mined the web for video reviews and video metadata.
> Then logging any users who visit the site will lead to data for #2 and #1.
>>> 
>>> The demo site is https://guide.finderbots.com and instructions are at
> the end of this for anyone who would like to test it out. As a crude user
> test there is a procedure we ask you to follow to help gather quality of
> recommendations data. It’s running out of my closet over Comcast so if it’s
> down I may have tripped over a cord, sorry try again later.
>>> 
>>> There are a bunch of different methods for making recs illustrated on
> the site. One method that illustrates blending metadata uses preference
> data from you, and metadata to bias and filter recs. Imagine that you have
> trained the system with your preferences by making some video picks. Now
> imagine you’d like to get recommendations for Comedies from Neflix based on
> your previous video preferences. This is done with a single Solr query on
> indexed video fields that hold genre, similar videos (from the similarity
> matrix), and sources. The query finds similar videos to the ones you have
> liked, with the genre “Comedy” boosted by some amount, but only those that
> have at least one source = “Netflix”.
>>> 
>>> I’ll be doing some blog posts covering the specifics of how each rec
> type is done, the site and DB architecture, and Solr setup.
>>> 
>>> The project uses the Solr recommender prep code here:
> https://github.com/pferrel/solr-recommender
>>> 
>>> BTW I plan to publish obfuscated usage data in the github repo.
>>> 
>>> begin form letter =======================================
>>> 
>>> Please use a very newly updated browser (latest Firefox, Chrome,
> Safari, and nothing older than IE10) the site doesn’t yet check browser
> compatibility but relies on HTML5 and CSS3 rather heavily.
>>> 
>>> 1) go to https://guide.finderbots.com/users/sign_up to create an
> account
>>> 2) go to https://guide.finderbots.com/trainers to ’train' the
> recommender hit thumbs up on videos you like. There are 20 pages of
> training videos, you can leave at any time but if you can go through them
> all it would be appreciated.
>>> 3) go to https://guide.finderbots.com/guides/recommend to immediately
> get personalized recs from your training data. If you completed the trainer
> check the top line of recs, count how many are videos you liked or would
> like to see. Scroll right or left to see a total of 24 in four batches of
> 6. If you could report to me the total you thought were good recs it would
> be greatly appreciated.
>>> 4) browse videos by various criteria here:
> https://guide.finderbots.com/guides These are not recommendations, they
> are simply a catalog.
>>> 5) control how you browse videos by clicking the gears icon. You can
> set all videos to be from one or more sources here. If you choose Netflix
> alone (don’t forget to uncheck ‘all’) then recs and browsed videos will all
> be available on Netflix.
>> 
>

Re: Solr+Mahout Recommender Demo Site

Posted by Ted Dunning <te...@gmail.com>.

This can actually be simplified a bit by using ItemSimilarityJob to call
RowSimilarityJob.

Nice work overall.


On Sun, Apr 6, 2014 at 10:21 PM, Andrew Musselman <
andrew.musselman@gmail.com> wrote:

> Pat, do you still want help putting this into a new mahout/examples, or
> work out how to do the distribution via "github pointer"?  There's an open
> bug for that.
>
> > On Apr 6, 2014, at 1:13 PM, Sebastian Schelter <ss...@apache.org> wrote:
> >
> > The top 3 recommendations "based on videos you liked" are very good!
> >
> > Nice job.
> >
> >
> >> On 04/06/2014 07:26 PM, Pat Ferrel wrote:
> >> After having integrated several versions of the Mahout and Myrrix
> recommenders at fairly large scale. I was interested in solving three
> problems that these did not directly provide for:
> >> 1) realtime queries for recs using data not yet incorporated into the
> training set. Myrrix allows this but Mahout using the hadoop mr version
> does not.
> >> 2) cross-recommendations from two or more action types (say purchase
> and detail-view)
> >> 3) blending metadata and user preference data to return recs (for
> example category & user preferences => recs)
> >>
> >> Using Solr + Mahout provided an amazingly flexible and performant way
> to do this. Ted wrote about his experience with this basic approach in his
> recent book. Take user preferences, run them through RowSimilarityJob and
> you get an item by item similarity Matrix. This is the core of an
> item-based cooccurrence recommender. If you take the similarity matrix, and
> convert it into a list of tokens per row, you have something Solr can
> index. If you then use a user’s history as a query on the indexed data you
> get an ordered list of recommendations.
> >>
> >> When I set out to do #1 and #3 the need for CF data AND metadata was
> the first problem. So I mined the web for video reviews and video metadata.
> Then logging any users who visit the site will lead to data for #2 and #1.
> >>
> >> The demo site is https://guide.finderbots.com and instructions are at
> the end of this for anyone who would like to test it out. As a crude user
> test there is a procedure we ask you to follow to help gather quality of
> recommendations data. It’s running out of my closet over Comcast so if it’s
> down I may have tripped over a cord, sorry try again later.
> >>
> >> There are a bunch of different methods for making recs illustrated on
> the site. One method that illustrates blending metadata uses preference
> data from you, and metadata to bias and filter recs. Imagine that you have
> trained the system with your preferences by making some video picks. Now
> imagine you’d like to get recommendations for Comedies from Neflix based on
> your previous video preferences. This is done with a single Solr query on
> indexed video fields that hold genre, similar videos (from the similarity
> matrix), and sources. The query finds similar videos to the ones you have
> liked, with the genre “Comedy” boosted by some amount, but only those that
> have at least one source = “Netflix”.
> >>
> >> I’ll be doing some blog posts covering the specifics of how each rec
> type is done, the site and DB architecture, and Solr setup.
> >>
> >> The project uses the Solr recommender prep code here:
> https://github.com/pferrel/solr-recommender
> >>
> >> BTW I plan to publish obfuscated usage data in the github repo.
> >>
> >> begin form letter =======================================
> >>
> >> Please use a very newly updated browser (latest Firefox, Chrome,
> Safari, and nothing older than IE10) the site doesn’t yet check browser
> compatibility but relies on HTML5 and CSS3 rather heavily.
> >>
> >> 1) go to https://guide.finderbots.com/users/sign_up to create an
> account
> >> 2) go to https://guide.finderbots.com/trainers to ’train' the
> recommender hit thumbs up on videos you like. There are 20 pages of
> training videos, you can leave at any time but if you can go through them
> all it would be appreciated.
> >> 3) go to https://guide.finderbots.com/guides/recommend to immediately
> get personalized recs from your training data. If you completed the trainer
> check the top line of recs, count how many are videos you liked or would
> like to see. Scroll right or left to see a total of 24 in four batches of
> 6. If you could report to me the total you thought were good recs it would
> be greatly appreciated.
> >> 4) browse videos by various criteria here:
> https://guide.finderbots.com/guides These are not recommendations, they
> are simply a catalog.
> >> 5) control how you browse videos by clicking the gears icon. You can
> set all videos to be from one or more sources here. If you choose Netflix
> alone (don’t forget to uncheck ‘all’) then recs and browsed videos will all
> be available on Netflix.
> >
>

Re: Solr+Mahout Recommender Demo Site

Posted by Pat Ferrel <pa...@occamsmachete.com>.

Yes. It still needs some work—the github repo is hard to use without a better explanation of Solr integration. It kind of leaves you most of the way there without a clear idea of how to do the rest. 

Also thinking about porting to Spark since all it really needs is RSJ and Matrix Multiply, not the entire recommender and cross-recommender.

On Apr 6, 2014, at 1:21 PM, Andrew Musselman <an...@gmail.com> wrote:

Pat, do you still want help putting this into a new mahout/examples, or work out how to do the distribution via "github pointer"?  There's an open bug for that.

> On Apr 6, 2014, at 1:13 PM, Sebastian Schelter <ss...@apache.org> wrote:
> 
> The top 3 recommendations "based on videos you liked" are very good!
> 
> Nice job.
> 
> 
>> On 04/06/2014 07:26 PM, Pat Ferrel wrote:
>> After having integrated several versions of the Mahout and Myrrix recommenders at fairly large scale. I was interested in solving three problems that these did not directly provide for:
>> 1) realtime queries for recs using data not yet incorporated into the training set. Myrrix allows this but Mahout using the hadoop mr version does not.
>> 2) cross-recommendations from two or more action types (say purchase and detail-view)
>> 3) blending metadata and user preference data to return recs (for example category & user preferences => recs)
>> 
>> Using Solr + Mahout provided an amazingly flexible and performant way to do this. Ted wrote about his experience with this basic approach in his recent book. Take user preferences, run them through RowSimilarityJob and you get an item by item similarity Matrix. This is the core of an item-based cooccurrence recommender. If you take the similarity matrix, and convert it into a list of tokens per row, you have something Solr can index. If you then use a user’s history as a query on the indexed data you get an ordered list of recommendations.
>> 
>> When I set out to do #1 and #3 the need for CF data AND metadata was the first problem. So I mined the web for video reviews and video metadata. Then logging any users who visit the site will lead to data for #2 and #1.
>> 
>> The demo site is https://guide.finderbots.com and instructions are at the end of this for anyone who would like to test it out. As a crude user test there is a procedure we ask you to follow to help gather quality of recommendations data. It’s running out of my closet over Comcast so if it’s down I may have tripped over a cord, sorry try again later.
>> 
>> There are a bunch of different methods for making recs illustrated on the site. One method that illustrates blending metadata uses preference data from you, and metadata to bias and filter recs. Imagine that you have trained the system with your preferences by making some video picks. Now imagine you’d like to get recommendations for Comedies from Neflix based on your previous video preferences. This is done with a single Solr query on indexed video fields that hold genre, similar videos (from the similarity matrix), and sources. The query finds similar videos to the ones you have liked, with the genre “Comedy” boosted by some amount, but only those that have at least one source = “Netflix”.
>> 
>> I’ll be doing some blog posts covering the specifics of how each rec type is done, the site and DB architecture, and Solr setup.
>> 
>> The project uses the Solr recommender prep code here: https://github.com/pferrel/solr-recommender
>> 
>> BTW I plan to publish obfuscated usage data in the github repo.
>> 
>> begin form letter =======================================
>> 
>> Please use a very newly updated browser (latest Firefox, Chrome, Safari, and nothing older than IE10) the site doesn’t yet check browser compatibility but relies on HTML5 and CSS3 rather heavily.
>> 
>> 1) go to https://guide.finderbots.com/users/sign_up to create an account
>> 2) go to https://guide.finderbots.com/trainers to ’train' the recommender hit thumbs up on videos you like. There are 20 pages of training videos, you can leave at any time but if you can go through them all it would be appreciated.
>> 3) go to https://guide.finderbots.com/guides/recommend to immediately get personalized recs from your training data. If you completed the trainer check the top line of recs, count how many are videos you liked or would like to see. Scroll right or left to see a total of 24 in four batches of 6. If you could report to me the total you thought were good recs it would be greatly appreciated.
>> 4) browse videos by various criteria here: https://guide.finderbots.com/guides These are not recommendations, they are simply a catalog.
>> 5) control how you browse videos by clicking the gears icon. You can set all videos to be from one or more sources here. If you choose Netflix alone (don’t forget to uncheck ‘all’) then recs and browsed videos will all be available on Netflix.
>

Re: Solr+Mahout Recommender Demo Site

Posted by Andrew Musselman <an...@gmail.com>.

Pat, do you still want help putting this into a new mahout/examples, or work out how to do the distribution via "github pointer"?  There's an open bug for that.

> On Apr 6, 2014, at 1:13 PM, Sebastian Schelter <ss...@apache.org> wrote:
> 
> The top 3 recommendations "based on videos you liked" are very good!
> 
> Nice job.
> 
> 
>> On 04/06/2014 07:26 PM, Pat Ferrel wrote:
>> After having integrated several versions of the Mahout and Myrrix recommenders at fairly large scale. I was interested in solving three problems that these did not directly provide for:
>> 1) realtime queries for recs using data not yet incorporated into the training set. Myrrix allows this but Mahout using the hadoop mr version does not.
>> 2) cross-recommendations from two or more action types (say purchase and detail-view)
>> 3) blending metadata and user preference data to return recs (for example category & user preferences => recs)
>> 
>> Using Solr + Mahout provided an amazingly flexible and performant way to do this. Ted wrote about his experience with this basic approach in his recent book. Take user preferences, run them through RowSimilarityJob and you get an item by item similarity Matrix. This is the core of an item-based cooccurrence recommender. If you take the similarity matrix, and convert it into a list of tokens per row, you have something Solr can index. If you then use a user’s history as a query on the indexed data you get an ordered list of recommendations.
>> 
>> When I set out to do #1 and #3 the need for CF data AND metadata was the first problem. So I mined the web for video reviews and video metadata. Then logging any users who visit the site will lead to data for #2 and #1.
>> 
>> The demo site is https://guide.finderbots.com and instructions are at the end of this for anyone who would like to test it out. As a crude user test there is a procedure we ask you to follow to help gather quality of recommendations data. It’s running out of my closet over Comcast so if it’s down I may have tripped over a cord, sorry try again later.
>> 
>> There are a bunch of different methods for making recs illustrated on the site. One method that illustrates blending metadata uses preference data from you, and metadata to bias and filter recs. Imagine that you have trained the system with your preferences by making some video picks. Now imagine you’d like to get recommendations for Comedies from Neflix based on your previous video preferences. This is done with a single Solr query on indexed video fields that hold genre, similar videos (from the similarity matrix), and sources. The query finds similar videos to the ones you have liked, with the genre “Comedy” boosted by some amount, but only those that have at least one source = “Netflix”.
>> 
>> I’ll be doing some blog posts covering the specifics of how each rec type is done, the site and DB architecture, and Solr setup.
>> 
>> The project uses the Solr recommender prep code here: https://github.com/pferrel/solr-recommender
>> 
>> BTW I plan to publish obfuscated usage data in the github repo.
>> 
>> begin form letter =======================================
>> 
>> Please use a very newly updated browser (latest Firefox, Chrome, Safari, and nothing older than IE10) the site doesn’t yet check browser compatibility but relies on HTML5 and CSS3 rather heavily.
>> 
>> 1) go to https://guide.finderbots.com/users/sign_up to create an account
>> 2) go to https://guide.finderbots.com/trainers to ’train' the recommender hit thumbs up on videos you like. There are 20 pages of training videos, you can leave at any time but if you can go through them all it would be appreciated.
>> 3) go to https://guide.finderbots.com/guides/recommend to immediately get personalized recs from your training data. If you completed the trainer check the top line of recs, count how many are videos you liked or would like to see. Scroll right or left to see a total of 24 in four batches of 6. If you could report to me the total you thought were good recs it would be greatly appreciated.
>> 4) browse videos by various criteria here: https://guide.finderbots.com/guides These are not recommendations, they are simply a catalog.
>> 5) control how you browse videos by clicking the gears icon. You can set all videos to be from one or more sources here. If you choose Netflix alone (don’t forget to uncheck ‘all’) then recs and browsed videos will all be available on Netflix.
>

Re: Solr+Mahout Recommender Demo Site

Posted by Sebastian Schelter <ss...@apache.org>.

The top 3 recommendations "based on videos you liked" are very good!

Nice job.


On 04/06/2014 07:26 PM, Pat Ferrel wrote:
> After having integrated several versions of the Mahout and Myrrix recommenders at fairly large scale. I was interested in solving three problems that these did not directly provide for:
> 1) realtime queries for recs using data not yet incorporated into the training set. Myrrix allows this but Mahout using the hadoop mr version does not.
> 2) cross-recommendations from two or more action types (say purchase and detail-view)
> 3) blending metadata and user preference data to return recs (for example category & user preferences => recs)
>
> Using Solr + Mahout provided an amazingly flexible and performant way to do this. Ted wrote about his experience with this basic approach in his recent book. Take user preferences, run them through RowSimilarityJob and you get an item by item similarity Matrix. This is the core of an item-based cooccurrence recommender. If you take the similarity matrix, and convert it into a list of tokens per row, you have something Solr can index. If you then use a user’s history as a query on the indexed data you get an ordered list of recommendations.
>
> When I set out to do #1 and #3 the need for CF data AND metadata was the first problem. So I mined the web for video reviews and video metadata. Then logging any users who visit the site will lead to data for #2 and #1.
>
> The demo site is https://guide.finderbots.com and instructions are at the end of this for anyone who would like to test it out. As a crude user test there is a procedure we ask you to follow to help gather quality of recommendations data. It’s running out of my closet over Comcast so if it’s down I may have tripped over a cord, sorry try again later.
>
> There are a bunch of different methods for making recs illustrated on the site. One method that illustrates blending metadata uses preference data from you, and metadata to bias and filter recs. Imagine that you have trained the system with your preferences by making some video picks. Now imagine you’d like to get recommendations for Comedies from Neflix based on your previous video preferences. This is done with a single Solr query on indexed video fields that hold genre, similar videos (from the similarity matrix), and sources. The query finds similar videos to the ones you have liked, with the genre “Comedy” boosted by some amount, but only those that have at least one source = “Netflix”.
>
> I’ll be doing some blog posts covering the specifics of how each rec type is done, the site and DB architecture, and Solr setup.
>
> The project uses the Solr recommender prep code here: https://github.com/pferrel/solr-recommender
>
> BTW I plan to publish obfuscated usage data in the github repo.
>
> begin form letter =======================================
>
> Please use a very newly updated browser (latest Firefox, Chrome, Safari, and nothing older than IE10) the site doesn’t yet check browser compatibility but relies on HTML5 and CSS3 rather heavily.
>
> 1) go to https://guide.finderbots.com/users/sign_up to create an account
> 2) go to https://guide.finderbots.com/trainers to ’train' the recommender hit thumbs up on videos you like. There are 20 pages of training videos, you can leave at any time but if you can go through them all it would be appreciated.
> 3) go to https://guide.finderbots.com/guides/recommend to immediately get personalized recs from your training data. If you completed the trainer check the top line of recs, count how many are videos you liked or would like to see. Scroll right or left to see a total of 24 in four batches of 6. If you could report to me the total you thought were good recs it would be greatly appreciated.
> 4) browse videos by various criteria here: https://guide.finderbots.com/guides These are not recommendations, they are simply a catalog.
> 5) control how you browse videos by clicking the gears icon. You can set all videos to be from one or more sources here. If you choose Netflix alone (don’t forget to uncheck ‘all’) then recs and browsed videos will all be available on Netflix.
>
>
>

Re: Solr+Mahout Recommender Demo Site

Posted by Ted Dunning <te...@gmail.com>.

It looks like it works well.

And it is gorgeous as well.

Nice work.  Very nice.



On Sun, Apr 6, 2014 at 8:59 PM, SriSatish Ambati <sr...@0xdata.com>wrote:

> It's quite good. Sri
>
>
> On Sun, Apr 6, 2014 at 10:26 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:
>
> > After having integrated several versions of the Mahout and Myrrix
> > recommenders at fairly large scale. I was interested in solving three
> > problems that these did not directly provide for:
> > 1) realtime queries for recs using data not yet incorporated into the
> > training set. Myrrix allows this but Mahout using the hadoop mr version
> > does not.
> > 2) cross-recommendations from two or more action types (say purchase and
> > detail-view)
> > 3) blending metadata and user preference data to return recs (for example
> > category & user preferences => recs)
> >
> > Using Solr + Mahout provided an amazingly flexible and performant way to
> > do this. Ted wrote about his experience with this basic approach in his
> > recent book. Take user preferences, run them through RowSimilarityJob and
> > you get an item by item similarity Matrix. This is the core of an
> > item-based cooccurrence recommender. If you take the similarity matrix,
> and
> > convert it into a list of tokens per row, you have something Solr can
> > index. If you then use a user's history as a query on the indexed data
> you
> > get an ordered list of recommendations.
> >
> > When I set out to do #1 and #3 the need for CF data AND metadata was the
> > first problem. So I mined the web for video reviews and video metadata.
> > Then logging any users who visit the site will lead to data for #2 and
> #1.
> >
> > The demo site is https://guide.finderbots.com and instructions are at
> the
> > end of this for anyone who would like to test it out. As a crude user
> test
> > there is a procedure we ask you to follow to help gather quality of
> > recommendations data. It's running out of my closet over Comcast so if
> it's
> > down I may have tripped over a cord, sorry try again later.
> >
> > There are a bunch of different methods for making recs illustrated on the
> > site. One method that illustrates blending metadata uses preference data
> > from you, and metadata to bias and filter recs. Imagine that you have
> > trained the system with your preferences by making some video picks. Now
> > imagine you'd like to get recommendations for Comedies from Neflix based
> on
> > your previous video preferences. This is done with a single Solr query on
> > indexed video fields that hold genre, similar videos (from the similarity
> > matrix), and sources. The query finds similar videos to the ones you have
> > liked, with the genre "Comedy" boosted by some amount, but only those
> that
> > have at least one source = "Netflix".
> >
> > I'll be doing some blog posts covering the specifics of how each rec type
> > is done, the site and DB architecture, and Solr setup.
> >
> > The project uses the Solr recommender prep code here:
> > https://github.com/pferrel/solr-recommender
> >
> > BTW I plan to publish obfuscated usage data in the github repo.
> >
> > begin form letter =======================================
> >
> > Please use a very newly updated browser (latest Firefox, Chrome, Safari,
> > and nothing older than IE10) the site doesn't yet check browser
> > compatibility but relies on HTML5 and CSS3 rather heavily.
> >
> > 1) go to https://guide.finderbots.com/users/sign_up to create an account
> > 2) go to https://guide.finderbots.com/trainers to 'train' the
> recommender
> > hit thumbs up on videos you like. There are 20 pages of training videos,
> > you can leave at any time but if you can go through them all it would be
> > appreciated.
> > 3) go to https://guide.finderbots.com/guides/recommend to immediately
> get
> > personalized recs from your training data. If you completed the trainer
> > check the top line of recs, count how many are videos you liked or would
> > like to see. Scroll right or left to see a total of 24 in four batches of
> > 6. If you could report to me the total you thought were good recs it
> would
> > be greatly appreciated.
> > 4) browse videos by various criteria here:
> > https://guide.finderbots.com/guides These are not recommendations, they
> > are simply a catalog.
> > 5) control how you browse videos by clicking the gears icon. You can set
> > all videos to be from one or more sources here. If you choose Netflix
> alone
> > (don't forget to uncheck 'all') then recs and browsed videos will all be
> > available on Netflix.
> >
> >
> >
>

Re: Solr+Mahout Recommender Demo Site

Posted by SriSatish Ambati <sr...@0xdata.com>.

It's quite good. Sri


On Sun, Apr 6, 2014 at 10:26 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> After having integrated several versions of the Mahout and Myrrix
> recommenders at fairly large scale. I was interested in solving three
> problems that these did not directly provide for:
> 1) realtime queries for recs using data not yet incorporated into the
> training set. Myrrix allows this but Mahout using the hadoop mr version
> does not.
> 2) cross-recommendations from two or more action types (say purchase and
> detail-view)
> 3) blending metadata and user preference data to return recs (for example
> category & user preferences => recs)
>
> Using Solr + Mahout provided an amazingly flexible and performant way to
> do this. Ted wrote about his experience with this basic approach in his
> recent book. Take user preferences, run them through RowSimilarityJob and
> you get an item by item similarity Matrix. This is the core of an
> item-based cooccurrence recommender. If you take the similarity matrix, and
> convert it into a list of tokens per row, you have something Solr can
> index. If you then use a user's history as a query on the indexed data you
> get an ordered list of recommendations.
>
> When I set out to do #1 and #3 the need for CF data AND metadata was the
> first problem. So I mined the web for video reviews and video metadata.
> Then logging any users who visit the site will lead to data for #2 and #1.
>
> The demo site is https://guide.finderbots.com and instructions are at the
> end of this for anyone who would like to test it out. As a crude user test
> there is a procedure we ask you to follow to help gather quality of
> recommendations data. It's running out of my closet over Comcast so if it's
> down I may have tripped over a cord, sorry try again later.
>
> There are a bunch of different methods for making recs illustrated on the
> site. One method that illustrates blending metadata uses preference data
> from you, and metadata to bias and filter recs. Imagine that you have
> trained the system with your preferences by making some video picks. Now
> imagine you'd like to get recommendations for Comedies from Neflix based on
> your previous video preferences. This is done with a single Solr query on
> indexed video fields that hold genre, similar videos (from the similarity
> matrix), and sources. The query finds similar videos to the ones you have
> liked, with the genre "Comedy" boosted by some amount, but only those that
> have at least one source = "Netflix".
>
> I'll be doing some blog posts covering the specifics of how each rec type
> is done, the site and DB architecture, and Solr setup.
>
> The project uses the Solr recommender prep code here:
> https://github.com/pferrel/solr-recommender
>
> BTW I plan to publish obfuscated usage data in the github repo.
>
> begin form letter =======================================
>
> Please use a very newly updated browser (latest Firefox, Chrome, Safari,
> and nothing older than IE10) the site doesn't yet check browser
> compatibility but relies on HTML5 and CSS3 rather heavily.
>
> 1) go to https://guide.finderbots.com/users/sign_up to create an account
> 2) go to https://guide.finderbots.com/trainers to 'train' the recommender
> hit thumbs up on videos you like. There are 20 pages of training videos,
> you can leave at any time but if you can go through them all it would be
> appreciated.
> 3) go to https://guide.finderbots.com/guides/recommend to immediately get
> personalized recs from your training data. If you completed the trainer
> check the top line of recs, count how many are videos you liked or would
> like to see. Scroll right or left to see a total of 24 in four batches of
> 6. If you could report to me the total you thought were good recs it would
> be greatly appreciated.
> 4) browse videos by various criteria here:
> https://guide.finderbots.com/guides These are not recommendations, they
> are simply a catalog.
> 5) control how you browse videos by clicking the gears icon. You can set
> all videos to be from one or more sources here. If you choose Netflix alone
> (don't forget to uncheck 'all') then recs and browsed videos will all be
> available on Netflix.
>
>
>