You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/11/08 21:38:41 UTC

[GitHub] [lucene] jmazanec15 commented on issue #11354: Reuse HNSW graphs when merging segments? [LUCENE-10318]

jmazanec15 commented on issue #11354:
URL: https://github.com/apache/lucene/issues/11354#issuecomment-1307862709

   Hi @mayya-sharipova @jtibshirani @msokolov 
   
   I figured out the issue in the previous tests with the recall - I was not using the copy of the vectors when recomputing the distances. I fixed that and re-ran the benchmarks and it looks like the recall values are fixed:
   
   ### Results
   #### 10K
   | Exper.      | time to merge (ms) | QPS | Recall |
   | ----------- | ----------------------------- | --- | ------ |
   | Control 1 | 611190 | 740 | 0.977 |
   | Control 2 | 621678 | 769 | 0.977 |
   | Control 3 | 619656 | 769 | 0.977 |
   | Test 1 | 649492 | 793 | 0.977 |
   | Test 2 | 663221 | 813 | 0.977 |
   | Test 3 | 594122 | 775 | 0.977 |
   
   #### 100K
   | Exper.      | time to merge (ms) | QPS | Recall |
   | ----------- | ----------------------------- | --- | ------ |
   | Control 1 | 621603 | 775 | 0.977 |
   | Control 2 | 627452 | 769 | 0.977 |
   | Control 3 | 628613 | 833 | 0.977 |
   | Test 1 | 583509 | 813 | 0.977 |
   | Test 2 | 608190 | 735 | 0.977 |
   | Test 3 | 602910 | 763 | 0.977 |
   
   #### 500K
   | Exper.      | time to merge (ms) | QPS | Recall |
   | ----------- | ----------------------------- | --- | ------ |
   | Control 1 | 671704 | 763 | 0.977 |
   | Control 2 | 643735 | 714 | 0.977 |
   | Control 3 | 639047 | 800 | 0.977 |
   | Test 1 | 369440 | 787 | 0.977 |
   | Test 2 | 361934 | 787 | 0.977 |
   | Test 3 | 346221 | 735 | 0.977 |
   
   That being said, I think initialization from a graph has benefits when a segment that is larger is being merged with other segments. For instance, on the 1M data set, when the segment size is 500K, merge time looks a lot better; however, when the segment size is 10K, merge time differences are not noticeable between control.
   
   I am wondering what you think might be good next steps. I was thinking that I could either get the PR out of draft state for review or I could focus on running more experiments on different data sets. Before doing those, I wanted to see what you thought of the results thus far?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org