You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Paul Lynch <pa...@yahoo.com> on 2009/01/15 13:49:49 UTC

Term Frequency and IndexSearcher

Hi,
 
I know it is very easy to get the frequency of a given term using the indexReader but I am looking to perform an index search and would like to get the frequency of the given term in the result set. Is this possible?
 
Thanks in advance,
Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Term Frequency and IndexSearcher

Posted by Murat Yakici <Mu...@cis.strath.ac.uk>.
Hi Paul,
I am tempted to suggest the following ( I am assuming here that the
document and the particular fields are TFVed when indexing):
 For every doc in the result set:
   - get the doc id
   - using the doc id, get the TermFreqVector of this document from the
index reader (tfv=ireader.getTermFreqVector(int docNumber, String
field) )
   - get the index number of the term by int index=tfv.indexOf(String term)
   - use this index number to get the frequency of the term by int
freq=tfv.getTermFrequencies()[index]

Remember, this frequency is the field frequency (number of times term T
occurs in field F in document D). If you are after the term's frequency in
document D (which includes all the fields), then you have to do the above
for every field that you are interested or the document has. This approach
has some drawbacks, loading TFV is slower than direct access. So if you
have like a thousand documents in the result set, you may want to put a
threshold or cache some of the most repeated documents/term pairs.

Having said that, there may be a better and efficient approach.

Murat

> Hi,
>  
> I know it is very easy to get the frequency of a given term using the
> indexReader but I am looking to perform an index search and would like to
> get the frequency of the given term in the result set. Is this possible?
>  
> Thanks in advance,
> Paul
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


Murat Yakici
Department of Computer & Information Sciences
University of Strathclyde
Glasgow, UK
-------------------------------------------
The University of Strathclyde is a charitable body, registered in Scotland,
with registration number SC015263.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Term Frequency and IndexSearcher

Posted by Chris Hostetter <ho...@fucit.org>.
: References:
:     <OF...@us.ibm.com>
:     <19...@webmail.cis.strath.ac.uk>
: Date: Thu, 15 Jan 2009 04:49:49 -0800 (PST)
: Subject: Term Frequency and IndexSearcher

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/Thread_hijacking




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org