You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Mayya Sharipova (Jira)" <ji...@apache.org> on 2022/08/17 20:02:00 UTC

[jira] [Comment Edited] (LUCENE-10318) Reuse HNSW graphs when merging segments?

    [ https://issues.apache.org/jira/browse/LUCENE-10318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580979#comment-17580979 ] 

Mayya Sharipova edited comment on LUCENE-10318 at 8/17/22 8:01 PM:
-------------------------------------------------------------------

Thanks for looking into this, Jack. 

We have not done any development on this, but some thoughts from us:
 * Looks like the way MergePolicy works, it chooses segments of approximately same size. So during merge, we may not have one single big segment, whose graph we can reuse.  So I would imagine for many uses case it may not worth reusing graphs (especially if segments are relative small) - extra complexity would not justify a very small speedups. 
 * I agree with your thoughts on deletions that it may also not worth reusing graphs if some heavy deletions are present.

So may be, a good start  could be have a very lean prototype with a lot of  performance benchmarks. 


was (Author: mayyas):
Thanks for looking into this, Jack. 

We have not done any development on this, but some thoughts from us (may be Julie can add more):
 * Looks like the way MergePolicy works, it chooses segments of approximately same size. So during merge, we may not have one single big segment, whose graph we can reuse.  So I would imagine for many uses case it may not worth reusing graphs (especially if segments are relative small) - extra complexity would not justify a very small speedups. 
 * I agree with your thoughts on deletions that it may also not worth reusing graphs is some heavy deletions are present.

So may be, a good start  could be have a very lean prototype with a lot of  performance benchmarks. 

> Reuse HNSW graphs when merging segments?
> ----------------------------------------
>
>                 Key: LUCENE-10318
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10318
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Julie Tibshirani
>            Priority: Major
>
> Currently when merging segments, the HNSW vectors format rebuilds the entire graph from scratch. In general, building these graphs is very expensive, and it'd be nice to optimize it in any way we can. I was wondering if during merge, we could choose the largest segment with no deletes, and load its HNSW graph into heap. Then we'd add vectors from the other segments to this graph, through the normal build process. This could cut down on the number of operations we need to perform when building the graph.
> This is just an early idea, I haven't run experiments to see if it would help. I'd guess that whether it helps would also depend on details of the MergePolicy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org