You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Hadas Cohen <ha...@gmail.com> on 2006/09/28 21:01:20 UTC

MoreLikeThis does not retrieve all terms when using like()

Ever since I started using Lucene, I found all answers to all possible
questions in the archive.

But I need help about those ones.

1.	I am using MoreLikeThis class, and cannot figure out why not all
terms are retrieved when using like() to generate queries.
I extract the terms from a document using getTermFreqVectors(i) and got
about 1160 terms. But when extracting the query using like() on the exact
same reader, I got about 760 terms in the query. 

I set up fieldnames and stopwords correctly, and the following: 

mlt.setAnalyzer(ANALYZER); // ANALYZER is a snowball analyzer, the same one
I've created the index with

      mlt.setMinDocFreq(0);

      mlt.setMinTermFreq(0);

      mlt.setMaxQueryTerms(2000);

      I was trying to understand the logic behind the order in which the
terms appear (when retrieving queries with like(), but it seems so random
(relatively to the termFreqVectors, which are strictly sorted).

2.	Not related to 1: I need to generate a crawler for my project, and
was wondering if there are any suggestions for a convenient API (since LARM
is no longer available.) 

 

Any advice(s) will be highly appreciated!