You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Dmitry Serebrennikov <dm...@earthlink.net> on 2001/10/19 20:32:05 UTC
Re: Getting word count
If you are referring to the number of documents containing a particular
term, that is available from IndexReader.termDocs(Term t). However, if
it is anything more complex than a single term (like a phrase or some
other query), I think the only way is to actually run a search on this
query and get the length of the Hits object returned. Slightly more
efficient, but requiring a bit more work, is to create a HitCollector
that uses a BitVector (see org.apache.lucene.util.BitVector) to mark off
documents that the searcher finds. Afterwards you can get the count from
the bit vector. This will skip over sorting that is done in the standard
HitCollector. You cannot simply count the number of times the method
collect() is called on your collector because some queries may result in
the same document being selected more than once and so you'd end up with
a double-count. (Can anyone confirm that this is the case?)
Nioche, Julien wrote:
>Hello All,
>
>I'm trying to get a word count information for exact phrases, i-e to know
>how many times a given form occur in the index. Does anyone know how I can
>do this in a clean way?
>
>Does it recquire modifying the score() methods of the different Scorers? Or
>is this information already computed somewhere else?
>
>Thanks a lot for your help
>
>Julien Nioche
>