You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Winton Davies <wd...@yahoo-inc.com> on 2006/08/29 20:50:19 UTC

Straight TF-IDF cosine similarity?

Hi All,

I'm scratching my head - can someone tell me which class implements 
an efficient multiple term TF.IDF Cosine similarity scoring mechanism?

There is clearly the single TermScorer - but I can't find the class 
that would do a bucketed TF.IDF cosine - i.e. fill an accumulator 
with the tf.idf^2 for each of the term posting lists, until 
accumulator is full, and then compute the final score.

I don't need a Boolean Query - at least this seems like overkill.

Cheers,
  Winton

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Straight TF-IDF cosine similarity?

Posted by Jason Polites <ja...@gmail.com>.
Have you looked at the MoreLikeThis class in the similarity package?

On 8/30/06, Winton Davies <wd...@yahoo-inc.com> wrote:
>
> Hi All,
>
> I'm scratching my head - can someone tell me which class implements
> an efficient multiple term TF.IDF Cosine similarity scoring mechanism?
>
> There is clearly the single TermScorer - but I can't find the class
> that would do a bucketed TF.IDF cosine - i.e. fill an accumulator
> with the tf.idf^2 for each of the term posting lists, until
> accumulator is full, and then compute the final score.
>
> I don't need a Boolean Query - at least this seems like overkill.
>
> Cheers,
>   Winton
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>