You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Moshe Cohen <mo...@gmail.com> on 2009/05/10 23:37:25 UTC

Deleted files considered for scoring

Hi,
I am using Lucene 2.4.1 via Pylucene and have encountered the following
behavior:
When there are deleted documents in the index the search scores are
identical to those that exist had those documents not been deleted.
If I optimize the index and the deleted documents are actually removed, the
the scoring is the same as if those documents were never indexed at all.

Is this a bug or am I missing something?
Optimization is not a feasible option for my use where there are as many
indexing actions as searching, and they are mixed.

Re: Deleted files considered for scoring

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Sun, May 10, 2009 at 5:37 PM, Moshe Cohen <mo...@gmail.com> wrote:
> I am using Lucene 2.4.1 via Pylucene and have encountered the following
> behavior:
> When there are deleted documents in the index the search scores are
> identical to those that exist had those documents not been deleted.
> If I optimize the index and the deleted documents are actually removed, the
> the scoring is the same as if those documents were never indexed at all.

This is working as designed... a known design tradeoff / limitation.
When a document is marked as deleted,  document frequency for terms
don't change (changing them would be impractical).

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org