You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Vinicius Carvalho <vi...@gmail.com> on 2009/04/03 02:13:33 UTC

Using Taste to recommend documents

Hi there! I would like to build a document recommendation system, and one of
the approaches I wish to experiment is use taste for that task. One idea I
had was to model users as documents, words as items and word frequencies on
documents as preferences.

Am I going on the right direction here?

Also, I'm a bit afraid about memory consumption here. So far we only have 6k
documents (which may have a few hundred words per doc). But would taste
scale to lets say 100k documents with few hundreds of words?

Best regards

-- 
The intuitive mind is a sacred gift and the
rational mind is a faithful servant. We have
created a society that honors the servant and
has forgotten the gift.

Re: Recommender impl using interpolated knn and SVD

Posted by Grant Ingersoll <gs...@apache.org>.

On Apr 10, 2009, at 2:17 PM, Sean Owen wrote:

> Grant sounds like it's true that it's sufficient to select that option
> in JIRA when attaching a patch file that says permission is given to
> include in the project?

Yeah, it sounds that way.

>
>
> Go ahead and generate a patch then. If that is inconvenient or
> difficult you could send me the code and I can create a patch/issue
> for you to bless.

Patch is the recommended way, that way the little permission checkbox  
gets checked, etc.

>
>
> Indeed as Grant says, keep all of mahout-dev in the loop though I can
> take charge of integrating this. (Or Grant you're welcome to as well,
> just happy to volunteer to do all the legwork.)

You can definitely take the lead, but I'm learning CF more and more  
each day, so it is good for me to pay attention

Re: Recommender impl using interpolated knn and SVD

Posted by Sean Owen <sr...@gmail.com>.

Grant sounds like it's true that it's sufficient to select that option
in JIRA when attaching a patch file that says permission is given to
include in the project?

Go ahead and generate a patch then. If that is inconvenient or
difficult you could send me the code and I can create a patch/issue
for you to bless.

Indeed as Grant says, keep all of mahout-dev in the loop though I can
take charge of integrating this. (Or Grant you're welcome to as well,
just happy to volunteer to do all the legwork.)

Sean

On Fri, Apr 10, 2009 at 4:27 PM, Andre Panisson <pa...@di.unito.it> wrote:
> Hi Sean, very good! I will wait for the comments from the Apache folk, and
> if everything is ok, I will open a JIRA issue with the patch.
>
> André

Re: Recommender impl using interpolated knn and SVD

Posted by Andre Panisson <pa...@di.unito.it>.

Hi Sean, very good! I will wait for the comments from the Apache folk,  
and if everything is ok, I will open a JIRA issue with the patch.

André

Citando Sean Owen <sr...@gmail.com>:

> Wow, very nice. I'm the guy to talk to. I had taken a crack at
> SVD-related implementations a few years ago and found it too
> complicated for my brain, and too slow. It is good to hear you have
> had success. It has been some time since I myself added any new
> algorithms.
>
> Perhaps the more experienced Apache folk can comment on this -- is
> this significant enough of a contribution that it needs a software
> grant agreement of some kind? Or is it OK to open a JIRA issue, attach
> a patch, and confirm there that it is granted to the ASF?
>
> Beyond that, the next step is to open an issue at
> http://issues.apache.org/jira/browse/MAHOUT and attach a patch which
> includes the changes. I can commit it, then I will probably have some
> tweaks and other changes I would like to make, which we can
> collaborate on.
>
> Sean
>
> On Fri, Apr 10, 2009 at 3:38 PM, Andre Panisson <pa...@di.unito.it> wrote:
>> Hi,
>>
>> I have an item based recommender implementation that uses Knn with
>> interpolated weights, based on the paper of Robert M. Bell and Yehuda Koren
>> in ICDM '07, and another implementation that uses Single Value Decomposition
>> to capture the features of a DataSet. I had very good results from these two
>> implementations, and I would like to share it with this project in order to
>> improve it and to know results from other people. What I need to do?
>>
>> Thanks,
>> André
>>
>>
>

Re: Recommender impl using interpolated knn and SVD

Posted by Sean Owen <sr...@gmail.com>.

Wow, very nice. I'm the guy to talk to. I had taken a crack at
SVD-related implementations a few years ago and found it too
complicated for my brain, and too slow. It is good to hear you have
had success. It has been some time since I myself added any new
algorithms.

Perhaps the more experienced Apache folk can comment on this -- is
this significant enough of a contribution that it needs a software
grant agreement of some kind? Or is it OK to open a JIRA issue, attach
a patch, and confirm there that it is granted to the ASF?

Beyond that, the next step is to open an issue at
http://issues.apache.org/jira/browse/MAHOUT and attach a patch which
includes the changes. I can commit it, then I will probably have some
tweaks and other changes I would like to make, which we can
collaborate on.

Sean

On Fri, Apr 10, 2009 at 3:38 PM, Andre Panisson <pa...@di.unito.it> wrote:
> Hi,
>
> I have an item based recommender implementation that uses Knn with
> interpolated weights, based on the paper of Robert M. Bell and Yehuda Koren
> in ICDM '07, and another implementation that uses Single Value Decomposition
> to capture the features of a DataSet. I had very good results from these two
> implementations, and I would like to share it with this project in order to
> improve it and to know results from other people. What I need to do?
>
> Thanks,
> André
>
>

Re: Recommender impl using interpolated knn and SVD

Posted by Grant Ingersoll <gs...@apache.org>.

Hey André,

This sounds interesting.  http://cwiki.apache.org/MAHOUT/howtocontribute.html 
  is the best starting place.  The gist of it is, create a patch and  
attach it to a JIRA issue and then pop over to mahout-dev@lucene.apache.org 
  and work with us to get it committed.

Let us know if you have any issues with patching, etc.

-Grant

On Apr 10, 2009, at 10:38 AM, Andre Panisson wrote:

> Hi,
>
> I have an item based recommender implementation that uses Knn with  
> interpolated weights, based on the paper of Robert M. Bell and  
> Yehuda Koren in ICDM '07, and another implementation that uses  
> Single Value Decomposition to capture the features of a DataSet. I  
> had very good results from these two implementations, and I would  
> like to share it with this project in order to improve it and to  
> know results from other people. What I need to do?
>
> Thanks,
> André
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search

Recommender impl using interpolated knn and SVD

Posted by Andre Panisson <pa...@di.unito.it>.

Hi,

I have an item based recommender implementation that uses Knn with  
interpolated weights, based on the paper of Robert M. Bell and Yehuda  
Koren in ICDM '07, and another implementation that uses Single Value  
Decomposition to capture the features of a DataSet. I had very good  
results from these two implementations, and I would like to share it  
with this project in order to improve it and to know results from  
other people. What I need to do?

Thanks,
André

Re: Using Taste to recommend documents

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Quick one for the original poster: You could also use Solr/Lucene's MoreLikeThis, for example.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Sean Owen <sr...@gmail.com>
> To: mahout-user@lucene.apache.org
> Sent: Friday, April 3, 2009 12:54:40 AM
> Subject: Re: Using Taste to recommend documents
> 
> You could do that. But then, the system would be recommending words to
> documents! Not quite what you want. I assume you still want to
> recommend documents to (real) users.
> 
> I would use other techniques to determine document similarity. Others
> on this list can suggest ideas, but, simple metrics based on word
> frequency should do well. Then, use that logic to create an
> implementation of ItemSimilarity. Then build a DataModel, perhaps a
> FileDataModel, maybe from a file containing user IDs, document IDs,
> and preference values. Then try a GenericItemBasedRecommender based on
> these components. We can discuss these more in detail later.
> 
> Assuming you go this way, a couple thousand documents (and a couple
> thousand users?) should be no problem to process in memory. It should
> be fast. I would, perhaps, make sure that your ItemSimilarity caches
> results, or perhaps is based on pre-computed values, since that would
> be slow to re-compute those over and over a runtime.
> 
> Sean
> 
> On Apr 3, 2009 7:14 AM, "Vinicius Carvalho" wrote:
> 
> Hi there! I would like to build a document recommendation system, and one of
> the approaches I wish to experiment is use taste for that task. One idea I
> had was to model users as documents, words as items and word frequencies on
> documents as preferences.
> 
> Am I going on the right direction here?
> 
> Also, I'm a bit afraid about memory consumption here. So far we only have 6k
> documents (which may have a few hundred words per doc). But would taste
> scale to lets say 100k documents with few hundreds of words?
> 
> Best regards
> 
> --
> The intuitive mind is a sacred gift and the
> rational mind is a faithful servant. We have
> created a society that honors the servant and
> has forgotten the gift.

Re: Using Taste to recommend documents

Posted by Sean Owen <sr...@gmail.com>.

You could do that. But then, the system would be recommending words to
documents! Not quite what you want. I assume you still want to
recommend documents to (real) users.

I would use other techniques to determine document similarity. Others
on this list can suggest ideas, but, simple metrics based on word
frequency should do well. Then, use that logic to create an
implementation of ItemSimilarity. Then build a DataModel, perhaps a
FileDataModel, maybe from a file containing user IDs, document IDs,
and preference values. Then try a GenericItemBasedRecommender based on
these components. We can discuss these more in detail later.

Assuming you go this way, a couple thousand documents (and a couple
thousand users?) should be no problem to process in memory. It should
be fast. I would, perhaps, make sure that your ItemSimilarity caches
results, or perhaps is based on pre-computed values, since that would
be slow to re-compute those over and over a runtime.

Sean

On Apr 3, 2009 7:14 AM, "Vinicius Carvalho" <vi...@gmail.com> wrote:

Hi there! I would like to build a document recommendation system, and one of
the approaches I wish to experiment is use taste for that task. One idea I
had was to model users as documents, words as items and word frequencies on
documents as preferences.

Am I going on the right direction here?

Also, I'm a bit afraid about memory consumption here. So far we only have 6k
documents (which may have a few hundred words per doc). But would taste
scale to lets say 100k documents with few hundreds of words?

Best regards

--
The intuitive mind is a sacred gift and the
rational mind is a faithful servant. We have
created a society that honors the servant and
has forgotten the gift.