You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Kamal Najib <ka...@mytum.de> on 2009/05/04 13:34:34 UTC

get the cosine similarity between two docs

Hi all, 
I try to get the cosine similarity between two docs:
I have tried first to create a document for a String like this:
Document doc1=new Document();
doc1.add(new Field("term","nodular lesions over years responding kamal najib nodular lesions over years responding",Field.Store.YES,Field.Index.TOKENIZED));
Document doc2=new Document();
doc2.add(new Field("term","we describe 5 cases( kamal najib , 61 years old )",Field.Store.YES,Field.Index.TOKENIZED));
than add the both docs to an indexWriter:
writer.addDocument(doc1);
writer.addDocument(doc2);

then create a query from the doc2 :
IndexReader ir=IndexReader.open(directory);
MoreLikeThis mlt=new MoreLikeThis(ir);
Query query=mlt.like(1);
and then search the query:
IndexSearcher searcher=new IndexSearcher(directory);
ScoreDoc[] scoreDocs=searcher.search(query,5);

then the lenght of the scoreDocs array was 0.Mean that the two docs are not similar?when are two docs similar in this apraotch?how can i check the correctness  of the similarity result?do i do somthing wrong?
thanks.
Kamal
--