You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Murat Yakici <Mu...@cis.strath.ac.uk> on 2009/06/22 10:55:15 UTC

Term collection frequency

Hi,

As far as I know, there is no public API to get a term's collection
frequency in Lucene, apart from writing routines with TFV or TermEnum.
Does Lucene store the number of times a term occur in the index? If yes,
can someone direct me to the low-level api where I can get such
information through some extension? If that is not possible, this would
require a change in the index format I imagine? Which classes I should be
dealing with and things I should be careful in implementing such a change?


Cheers,
Murat Yakici
Department of Computer & Information Sciences
University of Strathclyde
Glasgow, UK
-------------------------------------------
The University of Strathclyde is a charitable body, registered in Scotland,
with registration number SC015263.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Term collection frequency

Posted by Michael McCandless <lu...@mikemccandless.com>.
There's IndexReader.docFreq(Term), which returns the number of
documents that the term occurred in (excluding un-merged deletions).

But the global count of how many times a Term occurred across all docs
is not stored.

You'd have to get a TermDocs enum for that Term, iterate through all
docs, and sum up the freq() from each doc, to compute that, I believe.

Mike

On Mon, Jun 22, 2009 at 4:55 AM, Murat
Yakici<Mu...@cis.strath.ac.uk> wrote:
> Hi,
>
> As far as I know, there is no public API to get a term's collection
> frequency in Lucene, apart from writing routines with TFV or TermEnum.
> Does Lucene store the number of times a term occur in the index? If yes,
> can someone direct me to the low-level api where I can get such
> information through some extension? If that is not possible, this would
> require a change in the index format I imagine? Which classes I should be
> dealing with and things I should be careful in implementing such a change?
>
>
> Cheers,
> Murat Yakici
> Department of Computer & Information Sciences
> University of Strathclyde
> Glasgow, UK
> -------------------------------------------
> The University of Strathclyde is a charitable body, registered in Scotland,
> with registration number SC015263.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org