You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Kamal Najib <ka...@mytum.de> on 2009/05/06 11:18:44 UTC

Re: Re: I got the score "0.3044460713863373" for the cosine similarity of two document with the same text content !!

hi,
thanks for the reply.see: http://lucene.apache.org/java/2_4_1/api/index.html
you will find there the Similarity have created  and run to get the similarity between the two Strings.I did the folow:
I created a doc:
doc.add(new Field("term","this expression of galectin-1 in blood vessel walls was correlated with vascular", Field.Store.YES,Field.Index.TOKENIZED));
then I indexed it and i ran the followed Similarity query to get the cosine similarity : 
query=SimilarityQueries.formSimilarQuery("this expression of galectin-1 in blood vessel walls was correlated with vascular",analyzer,"term",null);
ScoreDoc[] scoreDocs = searcher.search(query,5).scoreDocs;
I got the score mentioned above.(0.3044460713863373)
thanks.
kamal
Original Message:

What is SimilarityQueries?  I'd try the explain capabilities to see  
<br />more.
<br />
<br />
<br />On May 5, 2009, at 2:23 PM, Kamal Najib wrote:
<br />
<br />> hi all,
<br />> i got the similarity score 0.3044460713863373 between two docs which  
<br />> have the same text content, is it correct? I expected 1.0, hier is  
<br />> my result line:
<br />>
<br />> doc:"this expression of galectin-1 in blood vessel walls was  
<br />> correlated with vascular"	
<br />> doc2 :"this expression of galectin-1 in blood vessel walls was  
<br />> correlated with vascular"	Score :"0.3044460713863373"
<br />> is the score correct?
<br />> my methode is :
<br />> public double getSimilarity(String v1,String  v2) throws Exception
<br />> {
<br />> 	
<br />> 	float result=0;
<br />> 	directory = new RAMDirectory();
<br />> 	Analyzer analyzer = new StandardAnalyzer();
<br />> 	IndexWriter writer = new IndexWriter(directory, analyzer,
<br />>      	true, IndexWriter.MaxFieldLength.LIMITED);
<br />> 	
<br />> 	
<br />> 	Document doc1 = new Document();
<br />> 	doc1.add(new Field("term",v1, Field.Store.YES,  
<br />> Field.Index.TOKENIZED));
<br />> 	writer.addDocument(doc1);
<br />> 	writer.close();
<br />> 	IndexReader ir=IndexReader.open(directory);
<br />> 	IndexSearcher searcher = new IndexSearcher(directory);
<br />> 	Query  
<br />> query=SimilarityQueries.formSimilarQuery(v2,analyzer,"term",null);
<br />> 	ScoreDoc[] scoreDocs = searcher.search(query,5).scoreDocs;
<br />> 	int docNum = scoreDocs[0].doc;
<br />>        result = scoreDocs[0].score;
<br />>        Document hitDoc = searcher.doc(docNum);
<br />> 	System.out.println("Term 1 :"+v2+"  Term2:"+hitDoc.get("term")+"   
<br />> Score :"+result);
<br />>        return result;
<br />> }
<br />> please help.
<br />> thanks in advance.
<br />> Kamal
<br />> -- 
<br />>
<br />>
<br />> ---------------------------------------------------------------------
<br />> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
<br />> For additional commands, e-mail: java-user-help@lucene.apache.org
<br />
<br />--------------------------
<br />Grant Ingersoll
<br />http://www.lucidimagination.com/
<br />
<br />Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
<br />using Solr/Lucene:
<br />http://www.lucidimagination.com/search
<br />
<br />
<br />---------------------------------------------------------------------
<br />To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
<br />For additional commands, e-mail: java-user-help@lucene.apache.org
<br />
<br />

--

Re: I got the score "0.3044460713863373" for the cosine similarity of two document with the same text content !!

Posted by Grant Ingersoll <gs...@apache.org>.

What does the searcher.explain() method say?

-Grant

On May 6, 2009, at 2:18 AM, Kamal Najib wrote:

> hi,
> thanks for the reply.see: http://lucene.apache.org/java/2_4_1/api/index.html
> you will find there the Similarity have created  and run to get the  
> similarity between the two Strings.I did the folow:
> I created a doc:
> doc.add(new Field("term","this expression of galectin-1 in blood  
> vessel walls was correlated with vascular",  
> Field.Store.YES,Field.Index.TOKENIZED));
> then I indexed it and i ran the followed Similarity query to get the  
> cosine similarity :
> query=SimilarityQueries.formSimilarQuery("this expression of  
> galectin-1 in blood vessel walls was correlated with  
> vascular",analyzer,"term",null);
> ScoreDoc[] scoreDocs = searcher.search(query,5).scoreDocs;
> I got the score mentioned above.(0.3044460713863373)
> thanks.
> kamal
> Original Message:
>
> What is SimilarityQueries?  I'd try the explain capabilities to see
> <br />more.
> <br />
> <br />
> <br />On May 5, 2009, at 2:23 PM, Kamal Najib wrote:
> <br />
> <br />> hi all,
> <br />> i got the similarity score 0.3044460713863373 between two  
> docs which
> <br />> have the same text content, is it correct? I expected 1.0,  
> hier is
> <br />> my result line:
> <br />>
> <br />> doc:"this expression of galectin-1 in blood vessel walls was
> <br />> correlated with vascular"	
> <br />> doc2 :"this expression of galectin-1 in blood vessel walls was
> <br />> correlated with vascular"	Score :"0.3044460713863373"
> <br />> is the score correct?
> <br />> my methode is :
> <br />> public double getSimilarity(String v1,String  v2) throws  
> Exception
> <br />> {
> <br />> 	
> <br />> 	float result=0;
> <br />> 	directory = new RAMDirectory();
> <br />> 	Analyzer analyzer = new StandardAnalyzer();
> <br />> 	IndexWriter writer = new IndexWriter(directory, analyzer,
> <br />>      	true, IndexWriter.MaxFieldLength.LIMITED);
> <br />> 	
> <br />> 	
> <br />> 	Document doc1 = new Document();
> <br />> 	doc1.add(new Field("term",v1, Field.Store.YES,
> <br />> Field.Index.TOKENIZED));
> <br />> 	writer.addDocument(doc1);
> <br />> 	writer.close();
> <br />> 	IndexReader ir=IndexReader.open(directory);
> <br />> 	IndexSearcher searcher = new IndexSearcher(directory);
> <br />> 	Query
> <br />>  
> query=SimilarityQueries.formSimilarQuery(v2,analyzer,"term",null);
> <br />> 	ScoreDoc[] scoreDocs = searcher.search(query,5).scoreDocs;
> <br />> 	int docNum = scoreDocs[0].doc;
> <br />>        result = scoreDocs[0].score;
> <br />>        Document hitDoc = searcher.doc(docNum);
> <br />> 	System.out.println("Term 1 :"+v2+"   
> Term2:"+hitDoc.get("term")+"
> <br />> Score :"+result);
> <br />>        return result;
> <br />> }
> <br />> please help.
> <br />> thanks in advance.
> <br />> Kamal
> <br />> --
> <br />>
> <br />>
> <br />>  
> ---------------------------------------------------------------------
> <br />> To unsubscribe, e-mail: java-user- 
> unsubscribe@lucene.apache.org
> <br />> For additional commands, e-mail: java-user-help@lucene.apache.org
> <br />
> <br />--------------------------
> <br />Grant Ingersoll
> <br />http://www.lucidimagination.com/
> <br />
> <br />Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/ 
> Droids)
> <br />using Solr/Lucene:
> <br />http://www.lucidimagination.com/search
> <br />
> <br />
> <br / 
> >---------------------------------------------------------------------
> <br />To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> <br />For additional commands, e-mail: java-user- 
> help@lucene.apache.org
> <br />
> <br />
>
> -- 
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org