You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Kamal Najib <ka...@mytum.de> on 2009/05/08 21:11:54 UTC
Re: Re: I got the score "0.3044460713863373" for the cosine similarity of two document with the same text content !!
Thank you for the Replay, i have got it.
Kamal.
Original Message:
What does the searcher.explain() method say?
<br />
<br />-Grant
<br />
<br />On May 6, 2009, at 2:18 AM, Kamal Najib wrote:
<br />
<br />> hi,
<br />> thanks for the reply.see: http://lucene.apache.org/java/2_4_1/api/index.html
<br />> you will find there the Similarity have created and run to get the
<br />> similarity between the two Strings.I did the folow:
<br />> I created a doc:
<br />> doc.add(new Field("term","this expression of galectin-1 in blood
<br />> vessel walls was correlated with vascular",
<br />> Field.Store.YES,Field.Index.TOKENIZED));
<br />> then I indexed it and i ran the followed Similarity query to get the
<br />> cosine similarity :
<br />> query=SimilarityQueries.formSimilarQuery("this expression of
<br />> galectin-1 in blood vessel walls was correlated with
<br />> vascular",analyzer,"term",null);
<br />> ScoreDoc[] scoreDocs = searcher.search(query,5).scoreDocs;
<br />> I got the score mentioned above.(0.3044460713863373)
<br />> thanks.
<br />> kamal
<br />> Original Message:
<br />>
<br />> What is SimilarityQueries? I'd try the explain capabilities to see
<br />> <br />more.
<br />> <br />
<br />> <br />
<br />> <br />On May 5, 2009, at 2:23 PM, Kamal Najib wrote:
<br />> <br />
<br />> <br />> hi all,
<br />> <br />> i got the similarity score 0.3044460713863373 between two
<br />> docs which
<br />> <br />> have the same text content, is it correct? I expected 1.0,
<br />> hier is
<br />> <br />> my result line:
<br />> <br />>
<br />> <br />> doc:"this expression of galectin-1 in blood vessel walls was
<br />> <br />> correlated with vascular"
<br />> <br />> doc2 :"this expression of galectin-1 in blood vessel walls was
<br />> <br />> correlated with vascular" Score :"0.3044460713863373"
<br />> <br />> is the score correct?
<br />> <br />> my methode is :
<br />> <br />> public double getSimilarity(String v1,String v2) throws
<br />> Exception
<br />> <br />> {
<br />> <br />>
<br />> <br />> float result=0;
<br />> <br />> directory = new RAMDirectory();
<br />> <br />> Analyzer analyzer = new StandardAnalyzer();
<br />> <br />> IndexWriter writer = new IndexWriter(directory, analyzer,
<br />> <br />> true, IndexWriter.MaxFieldLength.LIMITED);
<br />> <br />>
<br />> <br />>
<br />> <br />> Document doc1 = new Document();
<br />> <br />> doc1.add(new Field("term",v1, Field.Store.YES,
<br />> <br />> Field.Index.TOKENIZED));
<br />> <br />> writer.addDocument(doc1);
<br />> <br />> writer.close();
<br />> <br />> IndexReader ir=IndexReader.open(directory);
<br />> <br />> IndexSearcher searcher = new IndexSearcher(directory);
<br />> <br />> Query
<br />> <br />>
<br />> query=SimilarityQueries.formSimilarQuery(v2,analyzer,"term",null);
<br />> <br />> ScoreDoc[] scoreDocs = searcher.search(query,5).scoreDocs;
<br />> <br />> int docNum = scoreDocs[0].doc;
<br />> <br />> result = scoreDocs[0].score;
<br />> <br />> Document hitDoc = searcher.doc(docNum);
<br />> <br />> System.out.println("Term 1 :"+v2+"
<br />> Term2:"+hitDoc.get("term")+"
<br />> <br />> Score :"+result);
<br />> <br />> return result;
<br />> <br />> }
<br />> <br />> please help.
<br />> <br />> thanks in advance.
<br />> <br />> Kamal
<br />> <br />> --
<br />> <br />>
<br />> <br />>
<br />> <br />>
<br />> ---------------------------------------------------------------------
<br />> <br />> To unsubscribe, e-mail: java-user-
<br />> unsubscribe@lucene.apache.org
<br />> <br />> For additional commands, e-mail: java-user-help@lucene.apache.org
<br />> <br />
<br />> <br />--------------------------
<br />> <br />Grant Ingersoll
<br />> <br />http://www.lucidimagination.com/
<br />> <br />
<br />> <br />Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/
<br />> Droids)
<br />> <br />using Solr/Lucene:
<br />> <br />http://www.lucidimagination.com/search
<br />> <br />
<br />> <br />
<br />> <br /> >---------------------------------------------------------------------
<br />> <br />To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
<br />> <br />For additional commands, e-mail: java-user-
<br />> help@lucene.apache.org
<br />> <br />
<br />> <br />
<br />>
<br />> --
<br />>
<br />>
<br />> ---------------------------------------------------------------------
<br />> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
<br />> For additional commands, e-mail: java-user-help@lucene.apache.org
<br />
<br />--------------------------
<br />Grant Ingersoll
<br />http://www.lucidimagination.com/
<br />
<br />Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
<br />using Solr/Lucene:
<br />http://www.lucidimagination.com/search
<br />
<br />
<br />---------------------------------------------------------------------
<br />To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
<br />For additional commands, e-mail: java-user-help@lucene.apache.org
<br />
<br />
--