You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by David Spencer <da...@tropo.com> on 2005/01/17 22:46:23 UTC
MoreLikeThis and other similarity query generators checked in + online
demo
Based on mail from Doug I wrote a "more like this" query generator,
named, well, MoreLikeThis. Bruce Ritchie and Mark Harwood made changes
to it (esp term vector support) and bug fixes. Thanks to everyone.
I've checked in the code to the sandbox under contributions/similarity.
The package it ends up at is org.apache.lucene.search.similar -- hope
that makes sense.
I also created a class, SimilarityQueries, to hold other methods of
similarity query generation. The 2 methods in there are "dumber"
variations that use the entire source of the target doc to from a large
query.
Javadoc is here:
http://www.searchmorph.com/pub/jakarta-lucene-sandbox/contributions/similarity/build/docs/api/org/apache/lucene/search/similar/package-summary.html
Online demo here - this page below compares the 3 variations on
detecting similar docs. The timing info (3 numbers w/ "(ms)") may be
suspect. Also note if you scroll to the bottom you can see the queries
that were generated.
Here's a page showing docs similar to the entry for Iraq:
http://www.searchmorph.com/kat/wikipedia-compare.jsp?s=Iraq
And here's one for docs similar to the one on Garry Kasparov (he knows
how to play chess :) ):
http://www.searchmorph.com/kat/wikipedia-compare.jsp?s=Garry_Kasparov
To get to it you start here:
http://www.searchmorph.com/kat/wikipedia.jsp
And search for something - on the search results page follow a "cmp" link
http://www.searchmorph.com/kat/wikipedia.jsp?s=iraq
Make sense? Useful? Has anyone done any other variations (e.g. cosine
measure)?
- Dave
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org