You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Andy Nauli <an...@utoronto.ca> on 2003/05/15 16:53:33 UTC

computing document similarity

hi lucene developer,

I am planning to use lucene for calculating documents similarity

here's my plan:

I have set of similar documents, and I will index these documents and
extract say top 20 most indexed keywords....

when new documents is available, I want to calculate their similarity using
these extracted keywords....

how feasible is this in lucene ?
what is the best way to do this? I have implemented the most frequent
keyword
extracting part, now what left is performing document similarity
calculation..

thanks
Andy


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org