You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/08/23 19:48:42 UTC

[GitHub] [lucene] jmazanec15 commented on issue #11354: Reuse HNSW graphs when merging segments? [LUCENE-10318]

jmazanec15 commented on issue #11354:
URL: https://github.com/apache/lucene/issues/11354#issuecomment-1224742978

> Removing nodes and repairing the graph could be a nice direction. But for now we can keep things simple and assume there's a segment without deletes. If that's looking good and shows a nice improvement in index/ merge benchmarks, then we can handle deletes in a follow-up.

Thanks @mayya-sharipova @jtibshirani. I started working on this this week. Hopefully will have some experimental results by the end of this week or early next week. One potential issue I see is that the scores of the neighbors are not available through the KnnVectorReaders graphs at merge time. In the version Im working on now, I just recompute the scores
for the graph that will be retained in the merge process. However, this will add some overhead. Short of serializing the neighbor scores with the graph, do you have any other ideas for avoiding score re-computation?

> Another idea I played with at one point was to preserve all the graphs from the existing segments (remapping their ordinals) and linking them together with additional links. But a lot of links needed to be created in order to get close to the recall for a new "from scratch" graph, and I struggled to get any improvement. At the time I wasn't even concerning myself about deletions.

@msokolov That's interesting. I can't think of a way to guarantee that the graphs merge fully, what approach did you take? I guess one method for this might be for semi-inserting nodes (node retains some of its neighbors but also gets new ones) from the smaller graph into the larger graph. I think that a node in general should have a proportional number of neighbors from each graph based on the size of each graph. So if 2/3rds of all nodes are in the larger graph, 2/3rds of the neighbors from a node in the smaller graph could be updated to nodes in the larger graph. This approach I think would still require the procedure of looking up new neighbors for all of the nodes in the smaller graph - however search space should be smaller and M would also be smaller, so there might be some potential for improvement.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org