You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Thomas Keller <ti...@gmx.net> on 2013/01/15 21:22:03 UTC

Lucene-MoreLikethis

Hey, 

I have a question about "MoreLikeThis" in Lucene, Java. I built up an index and want to find similar documents. But I always get no results for my query, mlt.like(1) is always empty. Can anyone find my mistake? Here is an example. (I use Lucene 4.0) 

public class HelloLucene { 
  public static void main(String[] args) throws IOException, ParseException { 
  
   StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_40); 
   Directory index = new RAMDirectory(); 
   IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_40, analyzer); 
  
    IndexWriter w = new IndexWriter(index, config); 
    addDoc(w, "Lucene in Action", "193398817"); 
    addDoc(w, "Lucene for Dummies", "55320055Z"); 
    addDoc(w, "Managing Gigabytes", "55063554A"); 
    addDoc(w, "The Art of Computer Science", "9900333X"); 
    w.close(); 
  
    // search 
    IndexReader reader = DirectoryReader.open(index); 
    IndexSearcher searcher = new IndexSearcher(reader); 
  
    MoreLikeThis mlt = new MoreLikeThis(reader); 
    Query query = mlt.like(1); 
    System.out.println(searcher.search(query, 5).totalHits); 
  } 
  
  private static void addDoc(IndexWriter w, String title, String isbn) throws IOException { 
    Document doc = new Document(); 
    doc.add(new TextField("title", title, Field.Store.YES)); 
  
    // use a string field for isbn because we don't want it tokenized 
    doc.add(new StringField("isbn", isbn, Field.Store.YES)); 
    w.addDocument(doc); 
  } 
}

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene-MoreLikethis

Posted by Jack Krupansky <ja...@basetechnology.com>.
There are lots of parameters you can adjust, but the defaults essentially 
assume that you have a fairly large corpus and aren't interested in 
low-frequency terms.

So, try MoreLikeThis#setMinDocFreq. The default is 5. You don't have any 
terms in your example with a doc freq over 2.

Also, try setMinTermFreq. The default is 2. You don't have any terms with a 
term frequency above 1.

-- Jack Krupansky

-----Original Message----- 
From: Thomas Keller
Sent: Tuesday, January 15, 2013 3:22 PM
To: java-user@lucene.apache.org
Subject: Lucene-MoreLikethis

Hey,

I have a question about "MoreLikeThis" in Lucene, Java. I built up an index 
and want to find similar documents. But I always get no results for my 
query, mlt.like(1) is always empty. Can anyone find my mistake? Here is an 
example. (I use Lucene 4.0)

public class HelloLucene {
  public static void main(String[] args) throws IOException, ParseException 
{

   StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_40);
   Directory index = new RAMDirectory();
   IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_40, 
analyzer);

    IndexWriter w = new IndexWriter(index, config);
    addDoc(w, "Lucene in Action", "193398817");
    addDoc(w, "Lucene for Dummies", "55320055Z");
    addDoc(w, "Managing Gigabytes", "55063554A");
    addDoc(w, "The Art of Computer Science", "9900333X");
    w.close();

    // search
    IndexReader reader = DirectoryReader.open(index);
    IndexSearcher searcher = new IndexSearcher(reader);

    MoreLikeThis mlt = new MoreLikeThis(reader);
    Query query = mlt.like(1);
    System.out.println(searcher.search(query, 5).totalHits);
  }

  private static void addDoc(IndexWriter w, String title, String isbn) 
throws IOException {
    Document doc = new Document();
    doc.add(new TextField("title", title, Field.Store.YES));

    // use a string field for isbn because we don't want it tokenized
    doc.add(new StringField("isbn", isbn, Field.Store.YES));
    w.addDocument(doc);
  }
}

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org