You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Michael Sokolov (Jira)" <ji...@apache.org> on 2021/04/25 15:36:00 UTC
[jira] [Commented] (LUCENE-9937) ann-benchmarks results for HNSW search

    [ https://issues.apache.org/jira/browse/LUCENE-9937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331562#comment-17331562 ] 

Michael Sokolov commented on LUCENE-9937:
-----------------------------------------

I think an important statistic that helps when comparing these two implementations is the total number of nodes visited per search. In the hnswlib code, you can get this by running the `updates_test` in that package, which records it as `distance_comp_per_query` and we report it as the value of `TopDocs.totalHits` returned from the HNSW search. I tried this a little while ago and found comparable numbers for both implementations for that statistic (for the same graph construction and query parameters). This leads me to think the graph traversal is working OK, but I did this several months ago and didn't carefully record the parameters and test setup - it would be worth checking again.

Another caveat: I think you need to make sure that hnswlib is running in single-threaded mode in order to make a fair comparison, but since the roughly 7-8x difference in performance you recorded is about the same as what I saw, I suspect you must have done that too.

I also did some comparisons of raw dot-product calculation performance across C++ and Java and saw a pretty big difference there. Unfortunately, even using the JDK16 Vector API, which as Robert Muir demonstrated gives a substantial performance boost to raw dot product performance in Java, I only saw a ~30% improvement in overall QPS.

Thanks for digging in to this! The gap is bothersome, and hard to understand.

> ann-benchmarks results for HNSW search
> --------------------------------------
>
>                 Key: LUCENE-9937
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9937
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Julie Tibshirani
>            Priority: Minor
>
> I hooked up our HNSW implementation to [ann-benchmarks|https://github.com/erikbern/ann-benchmarks], a widely used repo for benchmarking nearest neighbor search libraries against large datasets. I found the results interesting and opened this issue to share and discuss. My benchmarking code can be found [here|https://github.com/jtibshirani/lucene/pull/1] – it's hacky and not easy to commit but I’m happy to help anyone get set up with it.
> Approaches
>  * LuceneVectorsOnly: a baseline that only indexes vectors, and performs a brute force scan to determine nearest neighbors
>  * LuceneHnsw: our HNSW implementation
>  * [hnswlib|https://github.com/nmslib/hnswlib]: a C++ HNSW implementation from the author of the paper
> Datasets
>  * sift-128-euclidean: 1 million SIFT feature vectors, dimension 128, comparing euclidean distance
>  * glove-100-angular: ~1.2 million GloVe word vectors, dimension 100, comparing cosine similarity
> *Results on sift-128-euclidean*
>  Parameters: M=16, efConstruction=500
> {code:java}
> Approach                Recall  QPS
> LuceneVectorsOnly()     1.000      6.764
> LuceneHnsw(n_cands=10)  0.603   7736.968
> LuceneHnsw(n_cands=50)  0.890   3605.000
> LuceneHnsw(n_cands=100) 0.953   2237.429
> LuceneHnsw(n_cands=500) 0.996    570.900
> LuceneHnsw(n_cands=800) 0.998    379.589
> hnswlib(n_cands=10)     0.713  69662.756
> hnswlib(n_cands=50)     0.985  16108.538
> hnswlib(n_cands=100)    0.950  28021.582
> hnswlib(n_cands=500)    1.000   4115.886
> hnswlib(n_cands=800)    1.000   2729.680
> {code}
> *Results on glove-100-angular*
>  Parameters: M=32, efConstruction=500
> {code:java}
> Approach                Recall  QPS
> LuceneVectorsOnly()     1.000      6.764
> LuceneHnsw(n_cands=10)  0.507   5036.236
> LuceneHnsw(n_cands=50)  0.760   2099.850
> LuceneHnsw(n_cands=100) 0.833   1233.690
> LuceneHnsw(n_cands=500) 0.941    309.077
> LuceneHnsw(n_cands=800) 0.961    203.782
> hnswlib(n_cands=10)     0.597  43543.345
> hnswlib(n_cands=50)     0.832  14719.165
> hnswlib(n_cands=100)    0.897   8299.948
> hnswlib(n_cands=500)    0.981   1931.985
> hnswlib(n_cands=800)    0.991    881.752
> {code}
> Notes on benchmark:
>  * By default, the ann-benchmarks repo retrieves 10 nearest neighbors and computes the recall against the true neighbors. The recall calculation has a small 'fudge factor' that allows neighbors that are within a small epsilon of the best distance. Queries are executed serially to obtain the QPS.
>  * I chose parameters where hnswlib performed well, then passed these same parameters to Lucene HNSW. For index-time parameters, I set maxConn as M and beamWidth as efConstruction. For search parameters, I set k to k, and fanout as (num_cands - k) so that the beam search is of size num_cands. Note that our default value for beamWidth is 16, which is really low – I wasn’t able to obtain acceptable recall until I bumped it to closer to 500 to match the hnswlib default.
>  * I force-merged to one segment before running searches since this gives the best recall + QPS, and also to match hnswlib.
> Some observations:
>  * It'd be really nice to extend luceneutil to measure vector search recall in addition to latency. That would help ensure we’re benchmarking a more realistic scenario, instead of accidentally indexing/ searching at a very low recall. Tracking recall would also guard against subtle, unintentional changes to the algorithm. It's easy to make an accidental change while refactoring, and with approximate algorithms, unit tests don't guard as well against this.
>  * Lucene HNSW gives a great speed-up over the baseline without sacrificing too much recall. But it doesn't perform as well as hnswlib in terms of both recall and QPS. We wouldn’t expect the results to line up perfectly, since Lucene doesn't actually implement HNSW – the current algorithm isn't actually hierarchical and only uses a single graph layer. Does this difference might indicate we're leaving performance 'on the table' by not using layers, which (I don't think) adds much index time or space? Or are there other algorithmic improvements would help close the gap?
>  * Setting beamWidth to 500 *really* slowed down indexing. I'll open a separate issue with indexing speed results, keeping this one focused on search.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org