You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by "mickey.guoyun" <mi...@gmail.com> on 2011/03/28 14:56:38 UTC

Mahout and Lucene

Hi,every one
I am new to Mahout,I use Lucene to build search engine,and i know Mahout can build Vector on Lucene's index files.
i want to recommend the similar documents and the recommender do not need the user's preferences.
how to code it?can anybody give me one example?
Thank you!

2011-03-28 



mickey.guoyun 

Re: Mahout and Lucene

Posted by Sebastian Schelter <ss...@apache.org>.
Hi Mickey,

you could use RowSimilarityJob 
(http://search-lucene.com/jd/mahout/core/org/apache/mahout/math/hadoop/similarity/RowSimilarityJob.html) 
to compute the pairwise similarities between your document vectors.

Before doing that you should make sure that terms that occur very often 
(like stopwords for example) are removed from your vectors, as they will 
significantly slow down the computation!

Btw how many documents do you have in your index? AFAIK Lucene has a 
class called "more like this" which finds similar documents in realtime. 
Have you already tried that? It might be a simpler solution.

--sebastian

On 28.03.2011 14:56, mickey.guoyun wrote:
> Hi,every one
> I am new to Mahout,I use Lucene to build search engine,and i know Mahout can build Vector on Lucene's index files.
> i want to recommend the similar documents and the recommender do not need the user's preferences.
> how to code it?can anybody give me one example?
> Thank you!
>
> 2011-03-28
>
>
>
> mickey.guoyun
>