You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by tierecke <ni...@gmail.com> on 2007/08/03 15:47:26 UTC

How can I get the Document Frequency for a specific term??? And more questions...

Hi,

Can I know in how many documents a term appears (DF - Document Frequency)?
Does Lucene keep it? Can I retrieve it?

Now - an even more advanced question:
Since I have a 77GB index, I cut it into 25 smaller indices of 3GB each and
I query them using MultiSearcher. Is there a possibility to know the DF of a
term throughout the whole collection or do I need to ask each index for the
DF of a specific term (supposing that my first question is solvable).

And the last question: Is there a way to know the total number of documents
in a Lucene Index? Is there a way to know the total number of documents in
multiple indexes together?

I hope it's not too much. Suddenly I find myself dealing with stuff I never
dealt before.

thanks a lot from Amsterdam, 
Nir.
-- 
View this message in context: http://www.nabble.com/How-can-I-get-the-Document-Frequency-for-a-specific-term----And-more-questions...-tf4212615.html#a11983532
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

Re: How can I get the Document Frequency for a specific term??? And more questions...

Posted by Grant Ingersoll <gs...@apache.org>.
On Aug 3, 2007, at 9:47 AM, tierecke wrote:

>
> Hi,
>
> Can I know in how many documents a term appears (DF - Document  
> Frequency)?
> Does Lucene keep it? Can I retrieve it?
>

See the TermEnum class (IndexReader.terms()

> Now - an even more advanced question:
> Since I have a 77GB index, I cut it into 25 smaller indices of 3GB  
> each and
> I query them using MultiSearcher. Is there a possibility to know  
> the DF of a
> term throughout the whole collection or do I need to ask each index  
> for the
> DF of a specific term (supposing that my first question is solvable).
>

See the MultiReader and MultiReader.terms()

> And the last question: Is there a way to know the total number of  
> documents
> in a Lucene Index? Is there a way to know the total number of  
> documents in
> multiple indexes together?

IndexReader.numDocs()
MultiReader.numDocs()

>
> I hope it's not too much. Suddenly I find myself dealing with stuff  
> I never
> dealt before.


Much better than doing the same stuff day after day for life, ain't  
it?  :-)




--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org