You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Kamal Najib <ka...@mytum.de> on 2009/05/05 23:23:37 UTC
I got the score "0.3044460713863373" for the cosine similarity of two document with the same text content !!
hi all,
i got the similarity score 0.3044460713863373 between two docs which have the same text content, is it correct? I expected 1.0, hier is my result line:
doc:"this expression of galectin-1 in blood vessel walls was correlated with vascular"
doc2 :"this expression of galectin-1 in blood vessel walls was correlated with vascular" Score :"0.3044460713863373"
is the score correct?
my methode is :
public double getSimilarity(String v1,String v2) throws Exception
{
float result=0;
directory = new RAMDirectory();
Analyzer analyzer = new StandardAnalyzer();
IndexWriter writer = new IndexWriter(directory, analyzer,
true, IndexWriter.MaxFieldLength.LIMITED);
Document doc1 = new Document();
doc1.add(new Field("term",v1, Field.Store.YES, Field.Index.TOKENIZED));
writer.addDocument(doc1);
writer.close();
IndexReader ir=IndexReader.open(directory);
IndexSearcher searcher = new IndexSearcher(directory);
Query query=SimilarityQueries.formSimilarQuery(v2,analyzer,"term",null);
ScoreDoc[] scoreDocs = searcher.search(query,5).scoreDocs;
int docNum = scoreDocs[0].doc;
result = scoreDocs[0].score;
Document hitDoc = searcher.doc(docNum);
System.out.println("Term 1 :"+v2+" Term2:"+hitDoc.get("term")+" Score :"+result);
return result;
}
please help.
thanks in advance.
Kamal
--
Re: I got the score "0.3044460713863373" for the cosine similarity of two document with the same text content !!
Posted by Grant Ingersoll <gs...@apache.org>.
What is SimilarityQueries? I'd try the explain capabilities to see
more.
On May 5, 2009, at 2:23 PM, Kamal Najib wrote:
> hi all,
> i got the similarity score 0.3044460713863373 between two docs which
> have the same text content, is it correct? I expected 1.0, hier is
> my result line:
>
> doc:"this expression of galectin-1 in blood vessel walls was
> correlated with vascular"
> doc2 :"this expression of galectin-1 in blood vessel walls was
> correlated with vascular" Score :"0.3044460713863373"
> is the score correct?
> my methode is :
> public double getSimilarity(String v1,String v2) throws Exception
> {
>
> float result=0;
> directory = new RAMDirectory();
> Analyzer analyzer = new StandardAnalyzer();
> IndexWriter writer = new IndexWriter(directory, analyzer,
> true, IndexWriter.MaxFieldLength.LIMITED);
>
>
> Document doc1 = new Document();
> doc1.add(new Field("term",v1, Field.Store.YES,
> Field.Index.TOKENIZED));
> writer.addDocument(doc1);
> writer.close();
> IndexReader ir=IndexReader.open(directory);
> IndexSearcher searcher = new IndexSearcher(directory);
> Query
> query=SimilarityQueries.formSimilarQuery(v2,analyzer,"term",null);
> ScoreDoc[] scoreDocs = searcher.search(query,5).scoreDocs;
> int docNum = scoreDocs[0].doc;
> result = scoreDocs[0].score;
> Document hitDoc = searcher.doc(docNum);
> System.out.println("Term 1 :"+v2+" Term2:"+hitDoc.get("term")+"
> Score :"+result);
> return result;
> }
> please help.
> thanks in advance.
> Kamal
> --
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org