You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Karl Wettin <ka...@gmail.com> on 2007/08/31 04:49:27 UTC
IndexReader#docFreq(Term)
I was running in to some problems that turned out to be a non-
documented feature. Here is a javadoc suggestion:
- /** Returns the number of documents containing the term <code>t</
code>.
+ /** Returns the number of documents, including deleted, containing
the term <code>t</code>.
* @throws IOException if there is a low-level IO error
*/
public abstract int docFreq(Term t) throws IOException;
I understand why, but wonder wether or not this feature is actually
used by something, if I need to mimic the behaviour in LUCENE-550 in
order to ensure compabillity?
--
karl
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: IndexReader#docFreq(Term)
Posted by Michael Busch <bu...@gmail.com>.
Chris Hostetter wrote:
>
> unless i'm mistaken, docFreq isn't the only method affected by deleted
> docs, things like termDocs, termPositions, terms, ... pretty much all of
> hte IndexReader methods work that way (even getFieldNames could be
> missleading if the only doc with a field of that name has been deleted)
>
TermDocs and TermPositions do take deleted docs into account.
The problem with TermEnum and docFreq is that a term doesn't get deleted
from the dictionary, even if its posting list only contains deleted
docs. To avoid this would be quite inefficient.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: IndexReader#docFreq(Term)
Posted by Chris Hostetter <ho...@fucit.org>.
> - /** Returns the number of documents containing the term <code>t</code>.
> + /** Returns the number of documents, including deleted, containing the
> term <code>t</code>.
there is a note about this in the javadocs for deleteDocument, but i agree
it's not entirely clear ...
unless i'm mistaken, docFreq isn't the only method affected by deleted
docs, things like termDocs, termPositions, terms, ... pretty much all of
hte IndexReader methods work that way (even getFieldNames could be
missleading if the only doc with a field of that name has been deleted)
rather then putting a specifc note about it in in each method, it might
make more sense to elaborate on this in the class level docs, and then
note only the exceptions (document(int) is hte only one i can think of ...
it throws an exception)
> I understand why, but wonder wether or not this feature is actually used by
> something, if I need to mimic the behaviour in LUCENE-550 in order to ensure
> compabillity?
I don't know if i'd consider it a "feature" ... i think of it more as an
caveat that lucene makes in order be able to optimize some things ... i
don't think it's neccessary that alternate impls mimic this behavior
exactly.
-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org