You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Thomas Keller <ti...@gmx.net> on 2013/01/15 21:22:03 UTC
Lucene-MoreLikethis
Hey,
I have a question about "MoreLikeThis" in Lucene, Java. I built up an index and want to find similar documents. But I always get no results for my query, mlt.like(1) is always empty. Can anyone find my mistake? Here is an example. (I use Lucene 4.0)
public class HelloLucene {
public static void main(String[] args) throws IOException, ParseException {
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_40);
Directory index = new RAMDirectory();
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_40, analyzer);
IndexWriter w = new IndexWriter(index, config);
addDoc(w, "Lucene in Action", "193398817");
addDoc(w, "Lucene for Dummies", "55320055Z");
addDoc(w, "Managing Gigabytes", "55063554A");
addDoc(w, "The Art of Computer Science", "9900333X");
w.close();
// search
IndexReader reader = DirectoryReader.open(index);
IndexSearcher searcher = new IndexSearcher(reader);
MoreLikeThis mlt = new MoreLikeThis(reader);
Query query = mlt.like(1);
System.out.println(searcher.search(query, 5).totalHits);
}
private static void addDoc(IndexWriter w, String title, String isbn) throws IOException {
Document doc = new Document();
doc.add(new TextField("title", title, Field.Store.YES));
// use a string field for isbn because we don't want it tokenized
doc.add(new StringField("isbn", isbn, Field.Store.YES));
w.addDocument(doc);
}
}
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Lucene-MoreLikethis
Posted by Jack Krupansky <ja...@basetechnology.com>.
There are lots of parameters you can adjust, but the defaults essentially
assume that you have a fairly large corpus and aren't interested in
low-frequency terms.
So, try MoreLikeThis#setMinDocFreq. The default is 5. You don't have any
terms in your example with a doc freq over 2.
Also, try setMinTermFreq. The default is 2. You don't have any terms with a
term frequency above 1.
-- Jack Krupansky
-----Original Message-----
From: Thomas Keller
Sent: Tuesday, January 15, 2013 3:22 PM
To: java-user@lucene.apache.org
Subject: Lucene-MoreLikethis
Hey,
I have a question about "MoreLikeThis" in Lucene, Java. I built up an index
and want to find similar documents. But I always get no results for my
query, mlt.like(1) is always empty. Can anyone find my mistake? Here is an
example. (I use Lucene 4.0)
public class HelloLucene {
public static void main(String[] args) throws IOException, ParseException
{
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_40);
Directory index = new RAMDirectory();
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_40,
analyzer);
IndexWriter w = new IndexWriter(index, config);
addDoc(w, "Lucene in Action", "193398817");
addDoc(w, "Lucene for Dummies", "55320055Z");
addDoc(w, "Managing Gigabytes", "55063554A");
addDoc(w, "The Art of Computer Science", "9900333X");
w.close();
// search
IndexReader reader = DirectoryReader.open(index);
IndexSearcher searcher = new IndexSearcher(reader);
MoreLikeThis mlt = new MoreLikeThis(reader);
Query query = mlt.like(1);
System.out.println(searcher.search(query, 5).totalHits);
}
private static void addDoc(IndexWriter w, String title, String isbn)
throws IOException {
Document doc = new Document();
doc.add(new TextField("title", title, Field.Store.YES));
// use a string field for isbn because we don't want it tokenized
doc.add(new StringField("isbn", isbn, Field.Store.YES));
w.addDocument(doc);
}
}
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org