You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Michael Sokolov (Jira)" <ji...@apache.org> on 2021/01/22 22:13:00 UTC

[jira] [Created] (LUCENE-9695) Don't include deleted documents when merging vectors

Michael Sokolov created LUCENE-9695:
---------------------------------------

             Summary: Don't include deleted documents when merging vectors
                 Key: LUCENE-9695
                 URL: https://issues.apache.org/jira/browse/LUCENE-9695
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Michael Sokolov


While testing HNSW searches with multi-segment indexes, all kinds of strange things were happening; recall performance was radically different for a force-merged multi-segment index than for the same index built as a single segment. Most testing I've done to date has been with single-segment indexes, shame on me.

One issue is that when merging we iterate over all the vectors from 0 .. size-1. But this size was being calculated without taking deletions into account, and this caused deleted vectors to be included in the graph leading to exceptions and weird inconsistencies.

The other issue has to do with aliasing in the diverse neighbor selection graph construction heuristic introduced recently. Sometimes vectors to be compared would be drawn from the same VectorValues, but this is a no-no since they are then the same vector (the first one will be overwritten when the second one is fetched). This leads to poor results, but not errors per se, but the results also became unpredictable in a way that causes the test written to reproduce the first issue to fail. Thus I'll include both fixes together.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org