You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/09/21 00:56:44 UTC
[GitHub] [lucene] jtibshirani opened a new pull request #312: LUCENE-10109: Bump default beam width for HNSW
jtibshirani opened a new pull request #312:
URL: https://github.com/apache/lucene/pull/312
Lucene90HnswVectorsFormat has a default 'beam width' of 16. This is quite low
and produces poor recall on typical-sized datasets.
This commit bumps it to 100. This new default tries to balance good search
performance with indexing speed. Most runs in ann-benchmarks set the parameter
between ~400 and 800, but they are heavily optimizing search over index speed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on pull request #312: LUCENE-10109: Bump default beam width for HNSW
Posted by GitBox <gi...@apache.org>.
jtibshirani commented on pull request #312:
URL: https://github.com/apache/lucene/pull/312#issuecomment-923615530
Here are some example results on the sift-128-euclidean dataset from ann-benchmarks. The recall/ QPS curve doesn't improve a huge amount after beamWidth passes 100.
**M=16, beamWidth=16**
Built index in 172.130s
```
Approach Recall QPS
LuceneHnsw(n_cands=50) 0.672 4727.125
LuceneHnsw(n_cands=100) 0.774 3071.469
```
**M=16, beamWidth=100**
Built index in 881.74s
```
Approach Recall QPS
LuceneHnsw(n_cands=50) 0.878 3876.651
LuceneHnsw(n_cands=100) 0.941 2347.171
```
**M=16, beamWidth=500**
Built index in 3156.50s
```
Approach Recall QPS
LuceneHnsw(n_cands=50) 0.891 3737.885
LuceneHnsw(n_cands=100) 0.952 2229.307
```
The [HNSW paper](https://arxiv.org/abs/1603.09320) also cites 100 as a reasonable example for efConst: "... for a 10M SIFT dataset and shows that a reasonable quality index can be constructed for efConstruction=100 on a 4X 2.4 GHz 10-core Xeon E5-4650 v2 CPU server in just 3 minutes. Further increase of the efConstruction leads to little extra performance but in exchange of significantly longer construction time."
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] jtibshirani edited a comment on pull request #312: LUCENE-10109: Bump default beam width for HNSW
Posted by GitBox <gi...@apache.org>.
jtibshirani edited a comment on pull request #312:
URL: https://github.com/apache/lucene/pull/312#issuecomment-923615530
Here are some example results on the sift-128-euclidean dataset from ann-benchmarks. The recall/ QPS curve doesn't improve a huge amount after beamWidth passes 100.
**M=16, beamWidth=16**
Built index in 172.130s
```
Approach Recall QPS
LuceneHnsw(n_cands=50) 0.672 4727.125
LuceneHnsw(n_cands=100) 0.774 3071.469
LuceneHnsw(n_cands=500) 0.927 807.171
```
**M=16, beamWidth=100**
Built index in 881.74s
```
Approach Recall QPS
LuceneHnsw(n_cands=50) 0.878 3876.651
LuceneHnsw(n_cands=100) 0.941 2347.171
LuceneHnsw(n_cands=500) 0.993 612.210
```
**M=16, beamWidth=500**
Built index in 3156.50s
```
Approach Recall QPS
LuceneHnsw(n_cands=50) 0.891 3737.885
LuceneHnsw(n_cands=100) 0.952 2229.307
LuceneHnsw(n_cands=500) 0.996 571.994
```
The [HNSW paper](https://arxiv.org/abs/1603.09320) also cites 100 as a reasonable example for efConst: "... for a 10M SIFT dataset and shows that a reasonable quality index can be constructed for efConstruction=100 on a 4X 2.4 GHz 10-core Xeon E5-4650 v2 CPU server in just 3 minutes. Further increase of the efConstruction leads to little extra performance but in exchange of significantly longer construction time."
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] jtibshirani merged pull request #312: LUCENE-10109: Bump default beam width for HNSW
Posted by GitBox <gi...@apache.org>.
jtibshirani merged pull request #312:
URL: https://github.com/apache/lucene/pull/312
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org