You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Saikat Kanjilal <sx...@hotmail.com> on 2014/04/26 17:25:50 UTC

Solr recommender

Pat,
I was wondering if you'd given any thought to genericizing the Solr recommender to work with both Solr and elasticsearch, namely are there pieces of the recommender that could plug into or be lifted above a search engine ( or in the case of elasticsearch a set of rest APIs).  I would be very interested in helping out with this.

Thoughts?

Sent from my iPad

Fwd: Solr recommender

Posted by Pat Ferrel <pa...@occamsmachete.com>.

Begin forwarded message:

From: Pat Ferrel <pa...@gmail.com>
Subject: Re: Solr recommender
Date: April 26, 2014 at 9:07:57 AM PDT
To: dev@mahout.apache.org

Yes, it already does. It’s not named well, all it really does is create an indicator matrix (item-item similarity using LLR) in a form that is digestible by a text indexer. You could use Solr or ElasticSearch to do the indexing and queries.

In the actual installation on the demo site https://guide.finderbots.com the indicator matrix is put into a DB and Solr is used to index the item collection’s similarity data field. The queries are handled by the web app framework. If I swapped out Solr for ElasticSearch for indexing the DB, it would work just fine and I looked into how to integrate it with my web app framework (RoR). The integration methods were significantly different though so I chose not to do both.

The reason I chose to put the indicator matrix in the DB is because it makes it very convenient to mix metadata into the recs queries. In the case of the demo site where the items are videos I have a bunch of recommendation types:
1) user-history based reqs—query is recent user “likes” history, the query is on the videos collection specifying the similar items field, which is a list of video id strings. This is most usually what people think a recommender does but is only the start.
2-9 are use various methods of biasing the results by genre metadata. Search engines also allow filtering by fields so you can specify videos filtered by source. So you can get comedies based on your “likes” filtered by source = Netflix. in fact when you set the source filter to Netflix every set of recs will contain only those on Netflix

There are so many ways to combine bias with filter and what you use as the query, that putting the fields in a DB made the most sense. I am still thinking of new ways to use this. For instance item-set similarity, which is used to give shopping cart recs in some systems. On the demo site you could do the same with the watchlist if there were enough watchlists. Use the user’s watchlist as query against all otehr watchlists and get back an ordered set of watchlists most similar to yours, take recs from there.

Some day I’ll write some blog posts about it but I’d encourage anyone with data to try the DB route rather than raw indexing of the text files just for the amazing flexibility and convenience it brings.

On Apr 26, 2014, at 8:25 AM, Saikat Kanjilal <sx...@hotmail.com> wrote:

Pat,
I was wondering if you'd given any thought to genericizing the Solr recommender to work with both Solr and elasticsearch, namely are there pieces of the recommender that could plug into or be lifted above a search engine ( or in the case of elasticsearch a set of rest APIs).  I would be very interested in helping out with this.

Thoughts?

Sent from my iPad



Re: Solr recommender

Posted by Pat Ferrel <pa...@occamsmachete.com>.
True, making your project independent. That should already work so go for it.

On Apr 26, 2014, at 10:21 AM, Saikat Kanjilal <sx...@hotmail.com> wrote:

That shouldn't technically matter, my thought is to create a spring based elasticsearch recommender that leverages spark cooccurrence underneath.

Sent from my iPad

> On Apr 26, 2014, at 10:07 AM, "Pat Ferrel" <pa...@occamsmachete.com> wrote:
> 
> Oh, and the example is old hadoop mapreduce, we’re redoing this with the new Spark cooccurrence code, which will replace ItemSimilarity job.
> 
> On Apr 26, 2014, at 10:03 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:
> 
> If you want, fork the github repo, do the integration and create a pull request. If the pull is accepted it will automatically be included in the Mahout build’s examples.
> 
> Some things to consider:
> 1) It is actually easier to use either Solr/Lucid/ElasticSearch’s web GUI for bare-bones illustration purposes. You’d have to enter the recs query by hand.  For demo purposes some example queries could be created ahead of time to illustrate the recs generating queries. I did this myself but didn’t include it in the example. I’d actually recommend this as a simple illustration.
> 2) I’d suspect the Solr+DB integration route would be the most common way people would actually use this but I could be wrong. This is what I did on the demo site but far beyond what you’d put in an example.
> 3) What data to use? Unless the data has human readable item ids, the demo is not as compelling
> 
> I can’t give you the demo site’s data since I mined the web for it, which allows me to use it but I don’t think I can republish it. Data actually gathered on the site by users I could share but there isn’t enough to work with. Maybe Ted has some from his demo.
> 
> On Apr 26, 2014, at 9:18 AM, Saikat Kanjilal <sx...@hotmail.com> wrote:
> 
> 
> 
> Sent from my iPad
> 
>> On Apr 26, 2014, at 9:18 AM, "Saikat Kanjilal" <sx...@hotmail.com> wrote:
>> 
>> Is it worth it to add in the elasticsearch piece into the demo and tie that into a generic mvc framework like spring, in fact we could leverage spring data's elasticsearch plugin.
>> 
>> Sent from my iPad
>> 
>>> On Apr 26, 2014, at 9:08 AM, "Pat Ferrel" <pa...@occamsmachete.com> wrote:
>>> 
>>> Yes, it already does. It’s not named well, all it really does is create an indicator matrix (item-item similarity using LLR) in a form that is digestible by a text indexer. You could use Solr or ElasticSearch to do the indexing and queries.
>>> 
>>> In the actual installation on the demo site https://guide.finderbots.com the indicator matrix is put into a DB and Solr is used to index the item collection’s similarity data field. The queries are handled by the web app framework. If I swapped out Solr for ElasticSearch for indexing the DB, it would work just fine and I looked into how to integrate it with my web app framework (RoR). The integration methods were significantly different though so I chose not to do both.
>>> 
>>> The reason I chose to put the indicator matrix in the DB is because it makes it very convenient to mix metadata into the recs queries. In the case of the demo site where the items are videos I have a bunch of recommendation types:
>>> 1) user-history based reqs—query is recent user “likes” history, the query is on the videos collection specifying the similar items field, which is a list of video id strings. This is most usually what people think a recommender does but is only the start.
>>> 2-9 are use various methods of biasing the results by genre metadata. Search engines also allow filtering by fields so you can specify videos filtered by source. So you can get comedies based on your “likes” filtered by source = Netflix. in fact when you set the source filter to Netflix every set of recs will contain only those on Netflix
>>> 
>>> There are so many ways to combine bias with filter and what you use as the query, that putting the fields in a DB made the most sense. I am still thinking of new ways to use this. For instance item-set similarity, which is used to give shopping cart recs in some systems. On the demo site you could do the same with the watchlist if there were enough watchlists. Use the user’s watchlist as query against all otehr watchlists and get back an ordered set of watchlists most similar to yours, take recs from there.
>>> 
>>> Some day I’ll write some blog posts about it but I’d encourage anyone with data to try the DB route rather than raw indexing of the text files just for the amazing flexibility and convenience it brings.
>>> 
>>> On Apr 26, 2014, at 8:25 AM, Saikat Kanjilal <sx...@hotmail.com> wrote:
>>> 
>>> Pat,
>>> I was wondering if you'd given any thought to genericizing the Solr recommender to work with both Solr and elasticsearch, namely are there pieces of the recommender that could plug into or be lifted above a search engine ( or in the case of elasticsearch a set of rest APIs).  I would be very interested in helping out with this.
>>> 
>>> Thoughts?
>>> 
>>> Sent from my iPad
>>> 
> 
> 


Re: Solr recommender

Posted by Saikat Kanjilal <sx...@hotmail.com>.
That shouldn't technically matter, my thought is to create a spring based elasticsearch recommender that leverages spark cooccurrence underneath.

Sent from my iPad

> On Apr 26, 2014, at 10:07 AM, "Pat Ferrel" <pa...@occamsmachete.com> wrote:
> 
> Oh, and the example is old hadoop mapreduce, we’re redoing this with the new Spark cooccurrence code, which will replace ItemSimilarity job.
> 
> On Apr 26, 2014, at 10:03 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:
> 
> If you want, fork the github repo, do the integration and create a pull request. If the pull is accepted it will automatically be included in the Mahout build’s examples.
> 
> Some things to consider:
> 1) It is actually easier to use either Solr/Lucid/ElasticSearch’s web GUI for bare-bones illustration purposes. You’d have to enter the recs query by hand.  For demo purposes some example queries could be created ahead of time to illustrate the recs generating queries. I did this myself but didn’t include it in the example. I’d actually recommend this as a simple illustration.
> 2) I’d suspect the Solr+DB integration route would be the most common way people would actually use this but I could be wrong. This is what I did on the demo site but far beyond what you’d put in an example.
> 3) What data to use? Unless the data has human readable item ids, the demo is not as compelling
> 
> I can’t give you the demo site’s data since I mined the web for it, which allows me to use it but I don’t think I can republish it. Data actually gathered on the site by users I could share but there isn’t enough to work with. Maybe Ted has some from his demo.
> 
> On Apr 26, 2014, at 9:18 AM, Saikat Kanjilal <sx...@hotmail.com> wrote:
> 
> 
> 
> Sent from my iPad
> 
>> On Apr 26, 2014, at 9:18 AM, "Saikat Kanjilal" <sx...@hotmail.com> wrote:
>> 
>> Is it worth it to add in the elasticsearch piece into the demo and tie that into a generic mvc framework like spring, in fact we could leverage spring data's elasticsearch plugin.
>> 
>> Sent from my iPad
>> 
>>> On Apr 26, 2014, at 9:08 AM, "Pat Ferrel" <pa...@occamsmachete.com> wrote:
>>> 
>>> Yes, it already does. It’s not named well, all it really does is create an indicator matrix (item-item similarity using LLR) in a form that is digestible by a text indexer. You could use Solr or ElasticSearch to do the indexing and queries.
>>> 
>>> In the actual installation on the demo site https://guide.finderbots.com the indicator matrix is put into a DB and Solr is used to index the item collection’s similarity data field. The queries are handled by the web app framework. If I swapped out Solr for ElasticSearch for indexing the DB, it would work just fine and I looked into how to integrate it with my web app framework (RoR). The integration methods were significantly different though so I chose not to do both.
>>> 
>>> The reason I chose to put the indicator matrix in the DB is because it makes it very convenient to mix metadata into the recs queries. In the case of the demo site where the items are videos I have a bunch of recommendation types:
>>> 1) user-history based reqs—query is recent user “likes” history, the query is on the videos collection specifying the similar items field, which is a list of video id strings. This is most usually what people think a recommender does but is only the start.
>>> 2-9 are use various methods of biasing the results by genre metadata. Search engines also allow filtering by fields so you can specify videos filtered by source. So you can get comedies based on your “likes” filtered by source = Netflix. in fact when you set the source filter to Netflix every set of recs will contain only those on Netflix
>>> 
>>> There are so many ways to combine bias with filter and what you use as the query, that putting the fields in a DB made the most sense. I am still thinking of new ways to use this. For instance item-set similarity, which is used to give shopping cart recs in some systems. On the demo site you could do the same with the watchlist if there were enough watchlists. Use the user’s watchlist as query against all otehr watchlists and get back an ordered set of watchlists most similar to yours, take recs from there.
>>> 
>>> Some day I’ll write some blog posts about it but I’d encourage anyone with data to try the DB route rather than raw indexing of the text files just for the amazing flexibility and convenience it brings.
>>> 
>>> On Apr 26, 2014, at 8:25 AM, Saikat Kanjilal <sx...@hotmail.com> wrote:
>>> 
>>> Pat,
>>> I was wondering if you'd given any thought to genericizing the Solr recommender to work with both Solr and elasticsearch, namely are there pieces of the recommender that could plug into or be lifted above a search engine ( or in the case of elasticsearch a set of rest APIs).  I would be very interested in helping out with this.
>>> 
>>> Thoughts?
>>> 
>>> Sent from my iPad
>>> 
> 
> 

Re: Solr recommender

Posted by Pat Ferrel <pa...@occamsmachete.com>.
Oh, and the example is old hadoop mapreduce, we’re redoing this with the new Spark cooccurrence code, which will replace ItemSimilarity job.

On Apr 26, 2014, at 10:03 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:

If you want, fork the github repo, do the integration and create a pull request. If the pull is accepted it will automatically be included in the Mahout build’s examples.

Some things to consider:
1) It is actually easier to use either Solr/Lucid/ElasticSearch’s web GUI for bare-bones illustration purposes. You’d have to enter the recs query by hand.  For demo purposes some example queries could be created ahead of time to illustrate the recs generating queries. I did this myself but didn’t include it in the example. I’d actually recommend this as a simple illustration.
2) I’d suspect the Solr+DB integration route would be the most common way people would actually use this but I could be wrong. This is what I did on the demo site but far beyond what you’d put in an example.
3) What data to use? Unless the data has human readable item ids, the demo is not as compelling

I can’t give you the demo site’s data since I mined the web for it, which allows me to use it but I don’t think I can republish it. Data actually gathered on the site by users I could share but there isn’t enough to work with. Maybe Ted has some from his demo.

On Apr 26, 2014, at 9:18 AM, Saikat Kanjilal <sx...@hotmail.com> wrote:



Sent from my iPad

> On Apr 26, 2014, at 9:18 AM, "Saikat Kanjilal" <sx...@hotmail.com> wrote:
> 
> Is it worth it to add in the elasticsearch piece into the demo and tie that into a generic mvc framework like spring, in fact we could leverage spring data's elasticsearch plugin.
> 
> Sent from my iPad
> 
>> On Apr 26, 2014, at 9:08 AM, "Pat Ferrel" <pa...@occamsmachete.com> wrote:
>> 
>> Yes, it already does. It’s not named well, all it really does is create an indicator matrix (item-item similarity using LLR) in a form that is digestible by a text indexer. You could use Solr or ElasticSearch to do the indexing and queries.
>> 
>> In the actual installation on the demo site https://guide.finderbots.com the indicator matrix is put into a DB and Solr is used to index the item collection’s similarity data field. The queries are handled by the web app framework. If I swapped out Solr for ElasticSearch for indexing the DB, it would work just fine and I looked into how to integrate it with my web app framework (RoR). The integration methods were significantly different though so I chose not to do both.
>> 
>> The reason I chose to put the indicator matrix in the DB is because it makes it very convenient to mix metadata into the recs queries. In the case of the demo site where the items are videos I have a bunch of recommendation types:
>> 1) user-history based reqs—query is recent user “likes” history, the query is on the videos collection specifying the similar items field, which is a list of video id strings. This is most usually what people think a recommender does but is only the start.
>> 2-9 are use various methods of biasing the results by genre metadata. Search engines also allow filtering by fields so you can specify videos filtered by source. So you can get comedies based on your “likes” filtered by source = Netflix. in fact when you set the source filter to Netflix every set of recs will contain only those on Netflix
>> 
>> There are so many ways to combine bias with filter and what you use as the query, that putting the fields in a DB made the most sense. I am still thinking of new ways to use this. For instance item-set similarity, which is used to give shopping cart recs in some systems. On the demo site you could do the same with the watchlist if there were enough watchlists. Use the user’s watchlist as query against all otehr watchlists and get back an ordered set of watchlists most similar to yours, take recs from there.
>> 
>> Some day I’ll write some blog posts about it but I’d encourage anyone with data to try the DB route rather than raw indexing of the text files just for the amazing flexibility and convenience it brings.
>> 
>> On Apr 26, 2014, at 8:25 AM, Saikat Kanjilal <sx...@hotmail.com> wrote:
>> 
>> Pat,
>> I was wondering if you'd given any thought to genericizing the Solr recommender to work with both Solr and elasticsearch, namely are there pieces of the recommender that could plug into or be lifted above a search engine ( or in the case of elasticsearch a set of rest APIs).  I would be very interested in helping out with this.
>> 
>> Thoughts?
>> 
>> Sent from my iPad
>> 



Re: Solr recommender

Posted by Pat Ferrel <pa...@occamsmachete.com>.
If you want, fork the github repo, do the integration and create a pull request. If the pull is accepted it will automatically be included in the Mahout build’s examples.

Some things to consider:
1) It is actually easier to use either Solr/Lucid/ElasticSearch’s web GUI for bare-bones illustration purposes. You’d have to enter the recs query by hand.  For demo purposes some example queries could be created ahead of time to illustrate the recs generating queries. I did this myself but didn’t include it in the example. I’d actually recommend this as a simple illustration.
2) I’d suspect the Solr+DB integration route would be the most common way people would actually use this but I could be wrong. This is what I did on the demo site but far beyond what you’d put in an example.
3) What data to use? Unless the data has human readable item ids, the demo is not as compelling

I can’t give you the demo site’s data since I mined the web for it, which allows me to use it but I don’t think I can republish it. Data actually gathered on the site by users I could share but there isn’t enough to work with. Maybe Ted has some from his demo.

On Apr 26, 2014, at 9:18 AM, Saikat Kanjilal <sx...@hotmail.com> wrote:



Sent from my iPad

> On Apr 26, 2014, at 9:18 AM, "Saikat Kanjilal" <sx...@hotmail.com> wrote:
> 
> Is it worth it to add in the elasticsearch piece into the demo and tie that into a generic mvc framework like spring, in fact we could leverage spring data's elasticsearch plugin.
> 
> Sent from my iPad
> 
>> On Apr 26, 2014, at 9:08 AM, "Pat Ferrel" <pa...@occamsmachete.com> wrote:
>> 
>> Yes, it already does. It’s not named well, all it really does is create an indicator matrix (item-item similarity using LLR) in a form that is digestible by a text indexer. You could use Solr or ElasticSearch to do the indexing and queries.
>> 
>> In the actual installation on the demo site https://guide.finderbots.com the indicator matrix is put into a DB and Solr is used to index the item collection’s similarity data field. The queries are handled by the web app framework. If I swapped out Solr for ElasticSearch for indexing the DB, it would work just fine and I looked into how to integrate it with my web app framework (RoR). The integration methods were significantly different though so I chose not to do both.
>> 
>> The reason I chose to put the indicator matrix in the DB is because it makes it very convenient to mix metadata into the recs queries. In the case of the demo site where the items are videos I have a bunch of recommendation types:
>> 1) user-history based reqs—query is recent user “likes” history, the query is on the videos collection specifying the similar items field, which is a list of video id strings. This is most usually what people think a recommender does but is only the start.
>> 2-9 are use various methods of biasing the results by genre metadata. Search engines also allow filtering by fields so you can specify videos filtered by source. So you can get comedies based on your “likes” filtered by source = Netflix. in fact when you set the source filter to Netflix every set of recs will contain only those on Netflix
>> 
>> There are so many ways to combine bias with filter and what you use as the query, that putting the fields in a DB made the most sense. I am still thinking of new ways to use this. For instance item-set similarity, which is used to give shopping cart recs in some systems. On the demo site you could do the same with the watchlist if there were enough watchlists. Use the user’s watchlist as query against all otehr watchlists and get back an ordered set of watchlists most similar to yours, take recs from there.
>> 
>> Some day I’ll write some blog posts about it but I’d encourage anyone with data to try the DB route rather than raw indexing of the text files just for the amazing flexibility and convenience it brings.
>> 
>> On Apr 26, 2014, at 8:25 AM, Saikat Kanjilal <sx...@hotmail.com> wrote:
>> 
>> Pat,
>> I was wondering if you'd given any thought to genericizing the Solr recommender to work with both Solr and elasticsearch, namely are there pieces of the recommender that could plug into or be lifted above a search engine ( or in the case of elasticsearch a set of rest APIs).  I would be very interested in helping out with this.
>> 
>> Thoughts?
>> 
>> Sent from my iPad
>> 


Re: Solr recommender

Posted by Saikat Kanjilal <sx...@hotmail.com>.

Sent from my iPad

> On Apr 26, 2014, at 9:18 AM, "Saikat Kanjilal" <sx...@hotmail.com> wrote:
> 
> Is it worth it to add in the elasticsearch piece into the demo and tie that into a generic mvc framework like spring, in fact we could leverage spring data's elasticsearch plugin.
> 
> Sent from my iPad
> 
>> On Apr 26, 2014, at 9:08 AM, "Pat Ferrel" <pa...@occamsmachete.com> wrote:
>> 
>> Yes, it already does. It’s not named well, all it really does is create an indicator matrix (item-item similarity using LLR) in a form that is digestible by a text indexer. You could use Solr or ElasticSearch to do the indexing and queries.
>> 
>> In the actual installation on the demo site https://guide.finderbots.com the indicator matrix is put into a DB and Solr is used to index the item collection’s similarity data field. The queries are handled by the web app framework. If I swapped out Solr for ElasticSearch for indexing the DB, it would work just fine and I looked into how to integrate it with my web app framework (RoR). The integration methods were significantly different though so I chose not to do both.
>> 
>> The reason I chose to put the indicator matrix in the DB is because it makes it very convenient to mix metadata into the recs queries. In the case of the demo site where the items are videos I have a bunch of recommendation types:
>> 1) user-history based reqs—query is recent user “likes” history, the query is on the videos collection specifying the similar items field, which is a list of video id strings. This is most usually what people think a recommender does but is only the start.
>> 2-9 are use various methods of biasing the results by genre metadata. Search engines also allow filtering by fields so you can specify videos filtered by source. So you can get comedies based on your “likes” filtered by source = Netflix. in fact when you set the source filter to Netflix every set of recs will contain only those on Netflix
>> 
>> There are so many ways to combine bias with filter and what you use as the query, that putting the fields in a DB made the most sense. I am still thinking of new ways to use this. For instance item-set similarity, which is used to give shopping cart recs in some systems. On the demo site you could do the same with the watchlist if there were enough watchlists. Use the user’s watchlist as query against all otehr watchlists and get back an ordered set of watchlists most similar to yours, take recs from there.
>> 
>> Some day I’ll write some blog posts about it but I’d encourage anyone with data to try the DB route rather than raw indexing of the text files just for the amazing flexibility and convenience it brings.
>> 
>> On Apr 26, 2014, at 8:25 AM, Saikat Kanjilal <sx...@hotmail.com> wrote:
>> 
>> Pat,
>> I was wondering if you'd given any thought to genericizing the Solr recommender to work with both Solr and elasticsearch, namely are there pieces of the recommender that could plug into or be lifted above a search engine ( or in the case of elasticsearch a set of rest APIs).  I would be very interested in helping out with this.
>> 
>> Thoughts?
>> 
>> Sent from my iPad
>> 

Re: Solr recommender

Posted by Saikat Kanjilal <sx...@hotmail.com>.
Is it worth it to add in the elasticsearch piece into the demo and tie that into a generic mvc framework like spring, in fact we could leverage spring data's elasticsearch plugin.

Sent from my iPad

> On Apr 26, 2014, at 9:08 AM, "Pat Ferrel" <pa...@occamsmachete.com> wrote:
> 
> Yes, it already does. It’s not named well, all it really does is create an indicator matrix (item-item similarity using LLR) in a form that is digestible by a text indexer. You could use Solr or ElasticSearch to do the indexing and queries.
> 
> In the actual installation on the demo site https://guide.finderbots.com the indicator matrix is put into a DB and Solr is used to index the item collection’s similarity data field. The queries are handled by the web app framework. If I swapped out Solr for ElasticSearch for indexing the DB, it would work just fine and I looked into how to integrate it with my web app framework (RoR). The integration methods were significantly different though so I chose not to do both.
> 
> The reason I chose to put the indicator matrix in the DB is because it makes it very convenient to mix metadata into the recs queries. In the case of the demo site where the items are videos I have a bunch of recommendation types:
> 1) user-history based reqs—query is recent user “likes” history, the query is on the videos collection specifying the similar items field, which is a list of video id strings. This is most usually what people think a recommender does but is only the start.
> 2-9 are use various methods of biasing the results by genre metadata. Search engines also allow filtering by fields so you can specify videos filtered by source. So you can get comedies based on your “likes” filtered by source = Netflix. in fact when you set the source filter to Netflix every set of recs will contain only those on Netflix
> 
> There are so many ways to combine bias with filter and what you use as the query, that putting the fields in a DB made the most sense. I am still thinking of new ways to use this. For instance item-set similarity, which is used to give shopping cart recs in some systems. On the demo site you could do the same with the watchlist if there were enough watchlists. Use the user’s watchlist as query against all otehr watchlists and get back an ordered set of watchlists most similar to yours, take recs from there.
> 
> Some day I’ll write some blog posts about it but I’d encourage anyone with data to try the DB route rather than raw indexing of the text files just for the amazing flexibility and convenience it brings.
> 
> On Apr 26, 2014, at 8:25 AM, Saikat Kanjilal <sx...@hotmail.com> wrote:
> 
> Pat,
> I was wondering if you'd given any thought to genericizing the Solr recommender to work with both Solr and elasticsearch, namely are there pieces of the recommender that could plug into or be lifted above a search engine ( or in the case of elasticsearch a set of rest APIs).  I would be very interested in helping out with this.
> 
> Thoughts?
> 
> Sent from my iPad
> 

Re: Solr recommender

Posted by Pat Ferrel <pa...@occamsmachete.com>.
Yes, it already does. It’s not named well, all it really does is create an indicator matrix (item-item similarity using LLR) in a form that is digestible by a text indexer. You could use Solr or ElasticSearch to do the indexing and queries.

In the actual installation on the demo site https://guide.finderbots.com the indicator matrix is put into a DB and Solr is used to index the item collection’s similarity data field. The queries are handled by the web app framework. If I swapped out Solr for ElasticSearch for indexing the DB, it would work just fine and I looked into how to integrate it with my web app framework (RoR). The integration methods were significantly different though so I chose not to do both.

The reason I chose to put the indicator matrix in the DB is because it makes it very convenient to mix metadata into the recs queries. In the case of the demo site where the items are videos I have a bunch of recommendation types:
1) user-history based reqs—query is recent user “likes” history, the query is on the videos collection specifying the similar items field, which is a list of video id strings. This is most usually what people think a recommender does but is only the start.
2-9 are use various methods of biasing the results by genre metadata. Search engines also allow filtering by fields so you can specify videos filtered by source. So you can get comedies based on your “likes” filtered by source = Netflix. in fact when you set the source filter to Netflix every set of recs will contain only those on Netflix

There are so many ways to combine bias with filter and what you use as the query, that putting the fields in a DB made the most sense. I am still thinking of new ways to use this. For instance item-set similarity, which is used to give shopping cart recs in some systems. On the demo site you could do the same with the watchlist if there were enough watchlists. Use the user’s watchlist as query against all otehr watchlists and get back an ordered set of watchlists most similar to yours, take recs from there.
 
Some day I’ll write some blog posts about it but I’d encourage anyone with data to try the DB route rather than raw indexing of the text files just for the amazing flexibility and convenience it brings.

On Apr 26, 2014, at 8:25 AM, Saikat Kanjilal <sx...@hotmail.com> wrote:

Pat,
I was wondering if you'd given any thought to genericizing the Solr recommender to work with both Solr and elasticsearch, namely are there pieces of the recommender that could plug into or be lifted above a search engine ( or in the case of elasticsearch a set of rest APIs).  I would be very interested in helping out with this.

Thoughts?

Sent from my iPad