You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2020/02/28 03:03:28 UTC

[GitHub] [lucene-solr] irvingzhang edited a comment on issue #1295: Lucene-9004: bug fix for searching the nearest one neighbor in higher layers

irvingzhang edited a comment on issue #1295: Lucene-9004: bug fix for searching the nearest one neighbor in higher layers
URL: https://github.com/apache/lucene-solr/pull/1295#issuecomment-592287771
 
 
   > I believe in practice that results. max size is always set to ef, so there shouldn't be any real issue. I agree that the interface doesn't make that plain; we should enforce this invariant by API contract
   
   I agree that the max size is always set to _ef_, but _ef_ has different values in different layers. 
   According to **Algorithm 5** of [papar](https://arxiv.org/pdf/1603.09320.pdf),  HNSW searches the nearest one (namely, _ef_=1) neighbor from top layer to the 1st layer,  and then finds the nearest _ef_ (_ef_=topK) neighbors from layer 0. In the implementation of Lucene HNSW, the actual size of result queue (Line 63, [HNSWGraphReader](https://github.com/apache/lucene-solr/blob/jira/lucene-9004-aknn-2/lucene/core/src/java/org/apache/lucene/util/hnsw/HNSWGraphReader.java)) is set to _ef_=topK when searching from top layer to the 1st layer, result in finding more neighbors than expected. Even if the parameter _ef_ is set to 1 in Line 66, [HNSWGraphReader](https://github.com/apache/lucene-solr/blob/jira/lucene-9004-aknn-2/lucene/core/src/java/org/apache/lucene/util/hnsw/HNSWGraphReader.java), the code `if (dist < f.distance() || results.size() < ef)` (Line 87, [HNSWGraph](https://github.com/apache/lucene-solr/blob/jira/lucene-9004-aknn-2/lucene/core/src/java/org/apache/lucene/util/hnsw/HNSWGraph.java)) allows insert more than 1 neighbors to the "results" when `dist < f.distance()` because the max size of "results" is _ef_=topK.
   
   The simplest way to check this problem is to print the actual size of neighbors. For example, add "System.out.println(neighbors.size());" after "visitedCount += hnsw.searchLayer(query, neighbors, 1, l, vectorValues);" (Line 66, [HNSWGraphReader](https://github.com/apache/lucene-solr/blob/jira/lucene-9004-aknn-2/lucene/core/src/java/org/apache/lucene/util/hnsw/HNSWGraphReader.java)), where the nearest one neighbor is expected, but the actual neighbor size would be range from 1~topK. Which also applies to [HNSWGraphWriter](https://github.com/apache/lucene-solr/blob/jira/lucene-9004-aknn-2/lucene/core/src/java/org/apache/lucene/util/hnsw/HNSWGraphWriter.java).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org