You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Michael McCandless (Jira)" <ji...@apache.org> on 2021/01/13 14:41:00 UTC

[jira] [Commented] (LUCENE-9644) HNSW diverse neighbor selection heuristic

    [ https://issues.apache.org/jira/browse/LUCENE-9644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17264190#comment-17264190 ] 

Michael McCandless commented on LUCENE-9644:
--------------------------------------------

This change seems to have caused a surprising jump in Lucene's nightly {{VectorSearch}} performance: [https://home.apache.org/~mikemccand/lucenebench/VectorSearch.html] (annotation DJ).

> HNSW diverse neighbor selection heuristic
> -----------------------------------------
>
>                 Key: LUCENE-9644
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9644
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael Sokolov
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> This will replace the simple nearest neighbor selection with a criterion that takes into account the distance of the neighbors from each other. It is seen to provide dramatically improved recall on at least two datasets, and is what is being used by our reference implementation, hnswlib. The basic idea is that when selecting  the nearest neighbors to associate with a new node added to the graph, we filter using a diversity criterion. If a candidate neighbor is closer to an already-added (closer to the new node) neighbor than it is to the new node, then we pass over it, moving on to more-distant, but presumably more diverse neighbors. The same criterion is also (re-) applied to the neighbor nodes' neighbors, since we add the links bidirectionally.
> h2. Results:
> h3. GloVe/Wikipedia
> h4. baseline
> ||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited||	index ms||
> |0.643|	 0.77|	100000|	50|	32|	64|	1742|	22830|
> |0.671|	 0.95|	100000|	100|	32|	64|	2141|	0|
> |0.704|	 1.32|	100000|	200|	32|	64|	2923|	0|
> |0.739|	 2.04|	100000|	400|	32|	64|	4382|	0|
> |0.470|	 0.91|	1000000|	50|	32|	64|	2068|	337081|
> |0.496|	 1.21|	1000000|	100|	32|	64|	2548|	0|
> |0.533|	 1.77|	1000000|	200|	32|	64|	3479|	0|
> |0.573|	 2.58|	1000000|	400|	32|	64|	5257|	0|
> h4. diverse
> ||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited||	index ms||
> |0.801|	 0.57|	100000|	50|	32|	64|	593|	17985|
> |0.840|	 0.67|	100000|	100|	32|	64|	738|	0|
> |0.883|	 0.97|	100000|	200|	32|	64|	1018|	0|
> |0.921|	 1.36|	100000|	400|	32|	64|	1502|	0|
> |0.723|	 0.71|	1000000|	50|	32|	64|	860|	298383|
> |0.761|	 0.77|	1000000|	100|	32|	64|	1058|	0|
> |0.806|	 1.06|	1000000|	200|	32|	64|	1442|	0|
> |0.854|	 1.67|	1000000|	400|	32|	64|	2159|	0|
> h3. Dataset from work:
> h4. baseline
> ||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited||	index ms||
> |0.933|	 1.41|	100000|	50|	32|	64|	1496|	35462|
> |0.948|	 1.39|	100000|	100|	32|	64|	1872|	0|
> |0.961|	 2.10|	100000|	200|	32|	64|	2591|	0|
> |0.972|	 3.04|	100000|	400|	32|	64|	3939|	0|
> |0.827|	 1.34|	1000000|	50|	32|	64|	1676|	535802|
> |0.854|	 1.76|	1000000|	100|	32|	64|	2056|	0|
> |0.887|	 2.47|	1000000|	200|	32|	64|	2761|	0|
> |0.907|	 3.75|	1000000|	400|	32|	64|	4129|	0|
> h4. diverse
> ||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited||	index ms||
> |0.966|	 1.18|	100000|	50|	32|	64|	1480|	37656|
> |0.977|	 1.46|	100000|	100|	32|	64|	1832|	0|
> |0.988|	 2.00|	100000|	200|	32|	64|	2472|	0|
> |0.995|	 3.14|	100000|	400|	32|	64|	3629|	0|
> |0.944|	 1.34|	1000000|	50|	32|	64|	1780|	526834|
> |0.959|	 1.71|	1000000|	100|	32|	64|	2222|	0|
> |0.975|	 2.30|	1000000|	200|	32|	64|	3041|	0|
> |0.986|	 3.56|	1000000|	400|	32|	64|	4543|	0|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org