You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2020/03/11 09:16:18 UTC

[GitHub] [lucene-solr] irvingzhang opened a new pull request #1340: Jira/lucene-9004: Refactor the implementation of HNSW following Faiss

irvingzhang opened a new pull request #1340: Jira/lucene-9004: Refactor the implementation of HNSW following Faiss
URL: https://github.com/apache/lucene-solr/pull/1340
 
 
   There were some misunderstandings in the implementation of HNSW algorithm, making the recall of Lucene HNSW about [5% lower](https://github.com/jtibshirani/ann-benchmarks/pull/1) than Faiss. This PR is meant to implement the HNSW algorithm in a way similar to Faiss. Here're some reasons that why I consider refactoring the implementation,
   1) The major target is to improve the recall percent and develop a correct implementation;
   2) Actually , the relationship between neighborhood is not symmetric. Taking the following picture for example, assume that max connection = 2, for Node3 its nearest neighbors are Node1 and Node2, but the nearest neighbors of Node1 are Node2 and Node4. So I think the connectNodes and shrink operations in current implementation is incorrect!
   ![image](https://user-images.githubusercontent.com/8521429/76395621-c04fb880-63b2-11ea-9a35-bcce5ac95e84.png);
   3) When insert a node at layer l, HNSW tries to greedy search the nearest one Nnearest node from max level layer to (l + 1) layer. Starting from layer l to layer 0, HNSW uses Nnearest rather than all the neighbors of previous layer to search efContruction nearest neighbors. In Faiss, Nnearest keeps unchanged from layer l to layer 0, while re-selecting the nearest node in each layer in nmslib. I followed the implementation of [Faiss HNSW](https://github.com/facebookresearch/faiss/blob/master/impl/HNSW.cpp) since it's much easier and more efficient.
   4) The neighborhood shrinkage strategy is incorrect. It should follow triangle inequality.
   
   It is likely not the best implementation. Further suggestions and discussions are welcomed.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org