You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by tierecke <ni...@gmail.com> on 2007/08/03 15:47:26 UTC
How can I get the Document Frequency for a specific term??? And
more questions...
Hi,
Can I know in how many documents a term appears (DF - Document Frequency)?
Does Lucene keep it? Can I retrieve it?
Now - an even more advanced question:
Since I have a 77GB index, I cut it into 25 smaller indices of 3GB each and
I query them using MultiSearcher. Is there a possibility to know the DF of a
term throughout the whole collection or do I need to ask each index for the
DF of a specific term (supposing that my first question is solvable).
And the last question: Is there a way to know the total number of documents
in a Lucene Index? Is there a way to know the total number of documents in
multiple indexes together?
I hope it's not too much. Suddenly I find myself dealing with stuff I never
dealt before.
thanks a lot from Amsterdam,
Nir.
--
View this message in context: http://www.nabble.com/How-can-I-get-the-Document-Frequency-for-a-specific-term----And-more-questions...-tf4212615.html#a11983532
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
Re: How can I get the Document Frequency for a specific term??? And more questions...
Posted by Grant Ingersoll <gs...@apache.org>.
On Aug 3, 2007, at 9:47 AM, tierecke wrote:
>
> Hi,
>
> Can I know in how many documents a term appears (DF - Document
> Frequency)?
> Does Lucene keep it? Can I retrieve it?
>
See the TermEnum class (IndexReader.terms()
> Now - an even more advanced question:
> Since I have a 77GB index, I cut it into 25 smaller indices of 3GB
> each and
> I query them using MultiSearcher. Is there a possibility to know
> the DF of a
> term throughout the whole collection or do I need to ask each index
> for the
> DF of a specific term (supposing that my first question is solvable).
>
See the MultiReader and MultiReader.terms()
> And the last question: Is there a way to know the total number of
> documents
> in a Lucene Index? Is there a way to know the total number of
> documents in
> multiple indexes together?
IndexReader.numDocs()
MultiReader.numDocs()
>
> I hope it's not too much. Suddenly I find myself dealing with stuff
> I never
> dealt before.
Much better than doing the same stuff day after day for life, ain't
it? :-)
--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org