You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Christian Reuschling <ch...@gmail.com> on 2009/01/07 16:09:11 UTC

Determining index term count

Is there a fast way to determine the total number of terms inside an index?

Currently I only found the way to walk through the TermEnumeration, i.e.

TermEnum termEnum4TermCount = reader.terms();
int iTermCount = 0;

while (termEnum4TermCount.next())
   iTermCount++;

termEnum4TermCount.close();


Thanks for all answers!

Christian

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Determining index term count

Posted by Andrzej Bialecki <ab...@getopt.org>.
Greg Shackles wrote:
> I'm not sure offhand how to write the code to do it, but I know when you
> open an index in Luke, that is one of the numbers it gives you.  If you want
> to just get the number once that would be an easy way to do it.  If you want
> the code for it, Luke is open source so you could see how they do it.  (I
> used Luke as a starting point at one point for seeing how to get a list of
> high frequency terms).

Luke currently uses the same method as you used, i.e. creates a TermEnum 
and traverses all terms. This is fast enough and doesn't require access 
to implementation details.

There is a faster way to do it, but it's not exposed through API. 
SegmentReader (a concrete impl. of IndexReader) opens a TermInfosReader, 
which has a field SegmentTermEnum:indexEnum, which in turn has a field 
"size", and this is the number of terms. Accessing this information this 
way would be messy - it's better to propose that this information should 
be added to API.

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Determining index term count

Posted by Greg Shackles <gs...@gmail.com>.
I'm not sure offhand how to write the code to do it, but I know when you
open an index in Luke, that is one of the numbers it gives you.  If you want
to just get the number once that would be an easy way to do it.  If you want
the code for it, Luke is open source so you could see how they do it.  (I
used Luke as a starting point at one point for seeing how to get a list of
high frequency terms).

- Greg

On Wed, Jan 7, 2009 at 10:09 AM, Christian Reuschling <
christian.reuschling@gmail.com> wrote:

> Is there a fast way to determine the total number of terms inside an index?
>
> Currently I only found the way to walk through the TermEnumeration, i.e.
>
> TermEnum termEnum4TermCount = reader.terms();
> int iTermCount = 0;
>
> while (termEnum4TermCount.next())
>   iTermCount++;
>
> termEnum4TermCount.close();
>
>
> Thanks for all answers!
>
> Christian
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>