You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Vishal Bathija <vi...@gmail.com> on 2006/04/11 17:58:20 UTC

getting frequency of a phrase within documents

Hi,
I am using phraseQuery to get the number of documents that the query
appers in using the hits. I would like to know if there is any way in
which i can get the number of times a phrase appears within each
document.

I am currently  using  searching for the phrase "avoids deadlock"

phraseQuery query =new PhraseQuery();
searcher = new IndexSearcher(rd);
String temp ="avoids";
String temp2 ="deadlock";
Term synset2 = new Term("contents",temp);
Term ss = new Term("contents",temp2);
query.add(synset2);
query.add( ss);
Hits hits = searcher.search(query);
System.out.println("number of hits="+((HitIterator)hits.iterator()).length() );

Any help would be greatly appreciated.




--
Vishal Bathija
Graduate Student
Department of Computer Science & Systems Analysis
Miami University
Oxford,Ohio
Phone: (513)-461-9239

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: getting frequency of a phrase within documents

Posted by Chris Hostetter <ho...@fucit.org>.
if you use a custom SImilarity class, the tf(float) function is used for
phrases to determine how the score should be determined based on the
number of times the phrase qppears in the documents.

if you make it an identity function, and modify the other functions in the
Similarity to be (mostly) constant values, then youcan peobably make a
Similarity class in which the score of each document is the number of time
the Phrase appears.

NOTE: this will really only work with exact phrases ... for inexact phrase
matches the value passed to tf is a sum that has already lost information
... the input might be "1.5" but you have no way of knowing if that's two
sloppy maches with a "freq" of 0.75" each, or 3 sloppy matches with of
"0.5" each.

You might be better off using the SpanNear class and the getSpans method.



: Date: Tue, 11 Apr 2006 11:58:20 -0400
: From: Vishal Bathija <vi...@gmail.com>
: Reply-To: java-user@lucene.apache.org
: To: java-user@lucene.apache.org
: Subject: getting frequency of a phrase within documents
:
: Hi,
: I am using phraseQuery to get the number of documents that the query
: appers in using the hits. I would like to know if there is any way in
: which i can get the number of times a phrase appears within each
: document.
:
: I am currently  using  searching for the phrase "avoids deadlock"
:
: phraseQuery query =new PhraseQuery();
: searcher = new IndexSearcher(rd);
: String temp ="avoids";
: String temp2 ="deadlock";
: Term synset2 = new Term("contents",temp);
: Term ss = new Term("contents",temp2);
: query.add(synset2);
: query.add( ss);
: Hits hits = searcher.search(query);
: System.out.println("number of hits="+((HitIterator)hits.iterator()).length() );
:
: Any help would be greatly appreciated.
:
:
:
:
: --
: Vishal Bathija
: Graduate Student
: Department of Computer Science & Systems Analysis
: Miami University
: Oxford,Ohio
: Phone: (513)-461-9239
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: For additional commands, e-mail: java-user-help@lucene.apache.org
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org