You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Karl Wettin <ka...@gmail.com> on 2007/08/31 04:49:27 UTC

IndexReader#docFreq(Term)

I was running in to some problems that turned out to be a non- 
documented feature. Here is a javadoc suggestion:

-  /** Returns the number of documents containing the term <code>t</ 
code>.
+  /** Returns the number of documents, including deleted, containing  
the term <code>t</code>.
    * @throws IOException if there is a low-level IO error
    */
   public abstract int docFreq(Term t) throws IOException;


I understand why, but wonder wether or not this feature is actually  
used by something, if I need to mimic the behaviour in LUCENE-550 in  
order to ensure compabillity?


-- 
karl

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: IndexReader#docFreq(Term)

Posted by Michael Busch <bu...@gmail.com>.

Chris Hostetter wrote:

> 
> unless i'm mistaken, docFreq isn't the only method affected by deleted
> docs, things like termDocs, termPositions, terms, ... pretty much all of
> hte IndexReader methods work that way (even getFieldNames could be
> missleading if the only doc with a field of that name has been deleted)
> 

TermDocs and TermPositions do take deleted docs into account.

The problem with TermEnum and docFreq is that a term doesn't get deleted
from the dictionary, even if its posting list only contains deleted
docs. To avoid this would be quite inefficient.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: IndexReader#docFreq(Term)

Posted by Chris Hostetter <ho...@fucit.org>.

> -  /** Returns the number of documents containing the term <code>t</code>.
> +  /** Returns the number of documents, including deleted, containing the 
> term <code>t</code>.

there is a note about this in the javadocs for deleteDocument, but i agree 
it's not entirely clear ...

unless i'm mistaken, docFreq isn't the only method affected by deleted 
docs, things like termDocs, termPositions, terms, ... pretty much all of 
hte IndexReader methods work that way (even getFieldNames could be 
missleading if the only doc with a field of that name has been deleted)

rather then putting a specifc note about it in in each method, it might 
make more sense to elaborate on this in the class level docs, and then 
note only the exceptions (document(int) is hte only one i can think of ... 
it throws an exception)

> I understand why, but wonder wether or not this feature is actually used by 
> something, if I need to mimic the behaviour in LUCENE-550 in order to ensure 
> compabillity?

I don't know if i'd consider it a "feature" ... i think of it more as an 
caveat that lucene makes in order be able to optimize some things ... i 
don't think it's neccessary that alternate impls mimic this behavior 
exactly.

-Hoss

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org