You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Dmitry Serebrennikov <dm...@earthlink.net> on 2001/10/19 20:32:05 UTC

Re: Getting word count

If you are referring to the number of documents containing a particular 
term, that is available from IndexReader.termDocs(Term t). However, if 
it is anything more complex than a single term (like a phrase or some 
other query), I think the only way is to actually run a search on this 
query and get the length of the Hits object returned. Slightly more 
efficient, but requiring a bit more work, is to create a HitCollector 
that uses a BitVector (see org.apache.lucene.util.BitVector) to mark off 
documents that the searcher finds. Afterwards you can get the count from 
the bit vector. This will skip over sorting that is done in the standard 
HitCollector. You cannot simply count the number of times the method 
collect() is called on your collector because some queries may result in 
the same document being selected more than once and so you'd end up with 
a double-count. (Can anyone confirm that this is the case?)

Nioche, Julien wrote:

>Hello All,
>
>I'm trying to get a word count information for exact phrases, i-e to know
>how many times a given form occur in the index. Does anyone know how I can
>do this in a clean way? 
>
>Does it recquire modifying the score() methods of the different Scorers? Or
>is this information already computed somewhere else?
>
>Thanks a lot for your help
>
>Julien Nioche
>