You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/05/06 18:54:50 UTC
[GitHub] [lucene] mayya-sharipova opened a new pull request, #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW
mayya-sharipova opened a new pull request, #872:
URL: https://github.com/apache/lucene/pull/872
The original HNSW paper (https://arxiv.org/pdf/1603.09320.pdf) suggests
to use a different maxConn for the upper layers vs. the bottom one
(which contains the full neighborhood graph). Specifically, they
suggest using maxConn=M for upper layers and maxConn=2*M for the bottom.
This patch ensures that we follow this recommendation and use
maxConn=2*M for the bottom layer.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW
Posted by GitBox <gi...@apache.org>.
mayya-sharipova commented on PR #872:
URL: https://github.com/apache/lucene/pull/872#issuecomment-1119925540
This patch involves format change, but for now I made changes in Lucene91* files.
The plan is once another https://github.com/apache/lucene/pull/870 that introduces Lucene92* format is merged to the main branch, I will reorganize this PR to use Lucene92* format.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on a diff in pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW
Posted by GitBox <gi...@apache.org>.
mayya-sharipova commented on code in PR #872:
URL: https://github.com/apache/lucene/pull/872#discussion_r870464398
##########
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java:
##########
@@ -43,7 +43,8 @@ public final class HnswGraphBuilder {
/** Random seed for level generation; public to expose for testing * */
public static long randSeed = DEFAULT_RAND_SEED;
- private final int maxConn;
+ private final int M; // max number of connections on upper layers
+ private final int maxConn0; // max number of connections on the 0th (last) layer
Review Comment:
Addressed in d16168f77cbeae581a093daa53113bf897ab8e31
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on a diff in pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW
Posted by GitBox <gi...@apache.org>.
mayya-sharipova commented on code in PR #872:
URL: https://github.com/apache/lucene/pull/872#discussion_r869226440
##########
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java:
##########
@@ -43,7 +43,8 @@ public final class HnswGraphBuilder {
/** Random seed for level generation; public to expose for testing * */
public static long randSeed = DEFAULT_RAND_SEED;
- private final int maxConn;
+ private final int M; // max number of connections on upper layers
Review Comment:
@mocobeta Thanks for checking, indeed it is good to rename `maxConn` to `M` parameter in the paper, and `maxConn` is ambiguous (as we have two values for `maxConn` now: for upper layers and a lower layer)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW
Posted by GitBox <gi...@apache.org>.
jtibshirani commented on PR #872:
URL: https://github.com/apache/lucene/pull/872#issuecomment-1121379649
Thanks, this looks the same as what I was seeing now! It's good motivation to add Lucene to ann-benchmarks so we can stop using a custom local benchmark set-up!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] mocobeta commented on a diff in pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW
Posted by GitBox <gi...@apache.org>.
mocobeta commented on code in PR #872:
URL: https://github.com/apache/lucene/pull/872#discussion_r868929217
##########
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java:
##########
@@ -43,7 +43,8 @@ public final class HnswGraphBuilder {
/** Random seed for level generation; public to expose for testing * */
public static long randSeed = DEFAULT_RAND_SEED;
- private final int maxConn;
+ private final int M; // max number of connections on upper layers
Review Comment:
Sorry - I missed the previous Julie's comment. If it makes sense to you, I'm fine with it. Maybe we should respect the parameter name in the original paper, like BM25's k and b.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on a diff in pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW
Posted by GitBox <gi...@apache.org>.
mayya-sharipova commented on code in PR #872:
URL: https://github.com/apache/lucene/pull/872#discussion_r870464781
##########
lucene/core/src/test/org/apache/lucene/util/hnsw/TestHnswGraph.java:
##########
@@ -256,10 +256,11 @@ public void testSearchWithSelectiveAcceptOrds() throws IOException {
public void testSearchWithSkewedAcceptOrds() throws IOException {
int nDoc = 1000;
+ int maxConn = 16;
Review Comment:
Good suggestion, I've decided to pull out this parameter. Addressed in d16168f77cbeae581a093daa53113bf897ab8e31
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW
Posted by GitBox <gi...@apache.org>.
mayya-sharipova commented on PR #872:
URL: https://github.com/apache/lucene/pull/872#issuecomment-1125028616
@jtibshirani @mocobeta Thanks for your review. I've changed the current PR based on the recent vector format changes from `main` branch – mostly copy, just in `OnHeapHnswGraph`, I've decided instead of `M` field to have dedicated `nsize` and `nsize0` - neighbours size, so not to calculate them for every node, which also made the code tidier. I hope this is fine.
I plan to merge this PR and backport it to branch_9x today.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW
Posted by GitBox <gi...@apache.org>.
mayya-sharipova commented on PR #872:
URL: https://github.com/apache/lucene/pull/872#issuecomment-1121144832
@jtibshirani Thanks for the comment. I've rerun the benchmarks as you suggested, and here are the new results
```txt
k Approach Recall QPS
10 luceneknn dim=100 {'M': 32, 'efConstruction': 100} 0.571 1874.073
50 luceneknn dim=100 {'M': 32, 'efConstruction': 100} 0.801 752.443
100 luceneknn dim=100 {'M': 32, 'efConstruction': 100} 0.865 463.214
500 luceneknn dim=100 {'M': 32, 'efConstruction': 100} 0.959 129.944
800 luceneknn dim=100 {'M': 32, 'efConstruction': 100} 0.974 87.815
1000 luceneknn dim=100 {'M': 32, 'efConstruction': 100} 0.980 73.514
10 hnswlib ({'M': 32, 'efConstruction': 100}) 0.552 16745.433
50 hnswlib ({'M': 32, 'efConstruction': 100}) 0.794 5738.468
100 hnswlib ({'M': 32, 'efConstruction': 100}) 0.860 3336.386
500 hnswlib ({'M': 32, 'efConstruction': 100}) 0.956 832.982
800 hnswlib ({'M': 32, 'efConstruction': 100}) 0.973 541.097
1000 hnswlib ({'M': 32, 'efConstruction': 100}) 0.979 442.163
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] mocobeta commented on a diff in pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW
Posted by GitBox <gi...@apache.org>.
mocobeta commented on code in PR #872:
URL: https://github.com/apache/lucene/pull/872#discussion_r868923104
##########
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java:
##########
@@ -43,7 +43,8 @@ public final class HnswGraphBuilder {
/** Random seed for level generation; public to expose for testing * */
public static long randSeed = DEFAULT_RAND_SEED;
- private final int maxConn;
+ private final int M; // max number of connections on upper layers
Review Comment:
Does it make sense to have a bit more descriptive name for it? Or should we just call `M`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on a diff in pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW
Posted by GitBox <gi...@apache.org>.
mayya-sharipova commented on code in PR #872:
URL: https://github.com/apache/lucene/pull/872#discussion_r868594873
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsWriter.java:
##########
@@ -53,12 +53,14 @@ public final class Lucene91HnswVectorsWriter extends KnnVectorsWriter {
private final int maxDoc;
private final int maxConn;
+ private final int maxConn0;
private final int beamWidth;
private boolean finished;
- Lucene91HnswVectorsWriter(SegmentWriteState state, int maxConn, int beamWidth)
+ Lucene91HnswVectorsWriter(SegmentWriteState state, int maxConn, int maxConn0, int beamWidth)
Review Comment:
@jtibshirani Thanks for the comment, this is a great suggestion! Addressed in b1a6394402d8d535b092de73b227fb5a40015c65
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW
Posted by GitBox <gi...@apache.org>.
jtibshirani commented on PR #872:
URL: https://github.com/apache/lucene/pull/872#issuecomment-1120061467
@mayya-sharipova what is the ann-benchmarks set-up that you're using? I found that in order to get comparable results to hnswlib, I had to make sure to adjust `efSearch` to these parameters:
```
run-groups:
M-4:
arg-groups:
- {"M": 32, "efConstruction": 100}
query-args: [[1000, 0, 40, 70, 90, 490, 790, 990]]
```
First, there is a warm-up run that we throw away. Then we subtract 10 from the other `efSearch` since `KnnGraphTester` adds `k=10` to this value:
```
private static TopDocs doKnnVectorQuery(
IndexSearcher searcher, String field, float[] vector, int k, int fanout) throws IOException {
return searcher.search(new KnnVectorQuery(field, vector, k + fanout), k);
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] mocobeta commented on a diff in pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW
Posted by GitBox <gi...@apache.org>.
mocobeta commented on code in PR #872:
URL: https://github.com/apache/lucene/pull/872#discussion_r870023604
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsReader.java:
##########
@@ -377,10 +377,13 @@ private static class FieldEntry {
for (int level = 0; level < numLevels; level++) {
if (level == 0) {
graphOffsetsByLevel[level] = 0;
+ } else if (level == 1) {
+ int numNodesOn0Level = size;
Review Comment:
minor: `numNodesOnLevel0` might be clearer at the first glance (and consistent with other parts)?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW
Posted by GitBox <gi...@apache.org>.
mayya-sharipova commented on PR #872:
URL: https://github.com/apache/lucene/pull/872#issuecomment-1119928464
I've run an evaluation:
baseline: main branch **glove-100-angular M:64 efConstruction:100**
candidate: this PR **glove-100-angular M:32 efConstruction:100**, **M:64 is used for 0th layer**
baseline: Indexed 1183514 documents in 1087s
candidate: Indexed 1183514 documents in 1097s
| | baseline recall | baseline QPS | candidate recall | candidate QPS |
| ----------- | --------------: | -----------: | ---------------: | ------------: |
| n_cands=10 | 0.686 | 1357.005 | 0.685 | 1356.824 |
| n_cands=20 | 0.742 | 1078.005 | 0.741 | 1073.286 |
| n_cands=40 | 0.802 | 783.909 | 0.801 | 779.959 |
| n_cands=80 | 0.857 | 519.065 | 0.856 | 516.899 |
| n_cands=120 | 0.885 | 394.007 | 0.884 | 394.255 |
| n_cands=200 | 0.916 | 270.300 | 0.916 | 267.125 |
| n_cands=400 | 0.951 | 156.148 | 0.951 | 154.515 |
| n_cands=600 | 0.966 | 112.399 | 0.966 | 112.545 |
| n_cands=800 | 0.974 | 88.135 | 0.975 | 87.845 |
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on a diff in pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW
Posted by GitBox <gi...@apache.org>.
mayya-sharipova commented on code in PR #872:
URL: https://github.com/apache/lucene/pull/872#discussion_r870466493
##########
lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java:
##########
@@ -31,6 +31,7 @@
public final class OnHeapHnswGraph extends HnswGraph {
private final int maxConn;
+ private final int maxConn0;
Review Comment:
Addressed in d16168f77cbeae581a093daa53113bf897ab8e31
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on a diff in pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW
Posted by GitBox <gi...@apache.org>.
mayya-sharipova commented on code in PR #872:
URL: https://github.com/apache/lucene/pull/872#discussion_r870464973
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsReader.java:
##########
@@ -377,10 +377,13 @@ private static class FieldEntry {
for (int level = 0; level < numLevels; level++) {
if (level == 0) {
graphOffsetsByLevel[level] = 0;
+ } else if (level == 1) {
+ int numNodesOn0Level = size;
Review Comment:
Good suggestion! Addressed in d16168f77cbeae581a093daa53113bf897ab8e31
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on a diff in pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW
Posted by GitBox <gi...@apache.org>.
jtibshirani commented on code in PR #872:
URL: https://github.com/apache/lucene/pull/872#discussion_r869686535
##########
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java:
##########
@@ -43,7 +43,8 @@ public final class HnswGraphBuilder {
/** Random seed for level generation; public to expose for testing * */
public static long randSeed = DEFAULT_RAND_SEED;
- private final int maxConn;
+ private final int M; // max number of connections on upper layers
+ private final int maxConn0; // max number of connections on the 0th (last) layer
Review Comment:
Small comment, I think it'd be clearer to just store the parameter `M` and compute `maxConn0` on the fly. It's such a simple calculation it doesn't really help to store it.
##########
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java:
##########
@@ -68,42 +69,43 @@ public final class HnswGraphBuilder {
*
* @param vectors the vectors whose relations are represented by the graph - must provide a
* different view over those vectors than the one used to add via addGraphNode.
- * @param maxConn the number of connections to make when adding a new graph node; roughly speaking
- * the graph fanout.
+ * @param M the number of connections to make when adding a new graph node; roughly speaking the
Review Comment:
I guess we should update this to mention that the last layer is treated differently (it uses max 2*M connections?)
##########
lucene/core/src/test/org/apache/lucene/util/hnsw/TestHnswGraph.java:
##########
@@ -256,10 +256,11 @@ public void testSearchWithSelectiveAcceptOrds() throws IOException {
public void testSearchWithSkewedAcceptOrds() throws IOException {
int nDoc = 1000;
+ int maxConn = 16;
Review Comment:
I guess this should be called `M`? Or maybe we don't even need to pull out this parameter?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] mocobeta commented on a diff in pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW
Posted by GitBox <gi...@apache.org>.
mocobeta commented on code in PR #872:
URL: https://github.com/apache/lucene/pull/872#discussion_r870035060
##########
lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java:
##########
@@ -31,6 +31,7 @@
public final class OnHeapHnswGraph extends HnswGraph {
private final int maxConn;
+ private final int maxConn0;
Review Comment:
I guess we could have just `M` here too?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] mocobeta commented on a diff in pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW
Posted by GitBox <gi...@apache.org>.
mocobeta commented on code in PR #872:
URL: https://github.com/apache/lucene/pull/872#discussion_r870030770
##########
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java:
##########
@@ -43,7 +43,8 @@ public final class HnswGraphBuilder {
/** Random seed for level generation; public to expose for testing * */
public static long randSeed = DEFAULT_RAND_SEED;
- private final int maxConn;
+ private final int M; // max number of connections on upper layers
+ private final int maxConn0; // max number of connections on the 0th (last) layer
Review Comment:
I'd agree with this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] mocobeta commented on a diff in pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW
Posted by GitBox <gi...@apache.org>.
mocobeta commented on code in PR #872:
URL: https://github.com/apache/lucene/pull/872#discussion_r868929217
##########
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java:
##########
@@ -43,7 +43,8 @@ public final class HnswGraphBuilder {
/** Random seed for level generation; public to expose for testing * */
public static long randSeed = DEFAULT_RAND_SEED;
- private final int maxConn;
+ private final int M; // max number of connections on upper layers
Review Comment:
Sorry - I missed the previous Julie's comment. If it makes sense to you, I'm fine with it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on a diff in pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW
Posted by GitBox <gi...@apache.org>.
jtibshirani commented on code in PR #872:
URL: https://github.com/apache/lucene/pull/872#discussion_r868255093
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsWriter.java:
##########
@@ -53,12 +53,14 @@ public final class Lucene91HnswVectorsWriter extends KnnVectorsWriter {
private final int maxDoc;
private final int maxConn;
+ private final int maxConn0;
private final int beamWidth;
private boolean finished;
- Lucene91HnswVectorsWriter(SegmentWriteState state, int maxConn, int beamWidth)
+ Lucene91HnswVectorsWriter(SegmentWriteState state, int maxConn, int maxConn0, int beamWidth)
Review Comment:
I was thinking we could just keep a single configuration parameter here, and internally calculate `maxConn0 = 2 * M`. If we allow it to be passed as a parameter, it seems like it's important to be able to configure it, but that's not the case (it is not something users will change and should always be set to `2 * M`). Like that we could also avoid writing a new value `maxConn0` into the format, which doesn't seem necessary?
If we are worried about naming, we could rename `maxConn` to `M`. From my perspective, it's okay to use single-letter variable names (with a clear comment!) when it directly corresponds to a paper's algorithm.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on a diff in pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW
Posted by GitBox <gi...@apache.org>.
mayya-sharipova commented on code in PR #872:
URL: https://github.com/apache/lucene/pull/872#discussion_r870463506
##########
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java:
##########
@@ -68,42 +69,43 @@ public final class HnswGraphBuilder {
*
* @param vectors the vectors whose relations are represented by the graph - must provide a
* different view over those vectors than the one used to add via addGraphNode.
- * @param maxConn the number of connections to make when adding a new graph node; roughly speaking
- * the graph fanout.
+ * @param M the number of connections to make when adding a new graph node; roughly speaking the
Review Comment:
Addressed in d16168f77cbeae581a093daa53113bf897ab8e31
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] mocobeta commented on a diff in pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW
Posted by GitBox <gi...@apache.org>.
mocobeta commented on code in PR #872:
URL: https://github.com/apache/lucene/pull/872#discussion_r868923104
##########
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java:
##########
@@ -43,7 +43,8 @@ public final class HnswGraphBuilder {
/** Random seed for level generation; public to expose for testing * */
public static long randSeed = DEFAULT_RAND_SEED;
- private final int maxConn;
+ private final int M; // max number of connections on upper layers
Review Comment:
Does it make sense to have a bit more descriptive name for it? Or should we just call this `M`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] mayya-sharipova merged pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW
Posted by GitBox <gi...@apache.org>.
mayya-sharipova merged PR #872:
URL: https://github.com/apache/lucene/pull/872
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW
Posted by GitBox <gi...@apache.org>.
mayya-sharipova commented on PR #872:
URL: https://github.com/apache/lucene/pull/872#issuecomment-1123937553
@jtibshirani @mocobeta Thanks for your review. I've addressed your latest comments in d16168f77cbeae581a093daa53113bf897ab8e31.
The plan for merging this PR is following: once [other changes](https://github.com/apache/lucene/pull/880) to vectors' format are merged to `main`, I will change this PR based on `main`, that all these changes will go to `Lucene92` vector format files instead of the current `Lucene91` vector format files.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW
Posted by GitBox <gi...@apache.org>.
mayya-sharipova commented on PR #872:
URL: https://github.com/apache/lucene/pull/872#issuecomment-1119937491
Also here is the comparison of recall with hnswlib (results from hnwlib are copied from the [Jira issue](https://issues.apache.org/jira/browse/LUCENE-10527), and not run on my machine)
```txt
k Approach Recall QPS
10 luceneknn dim=100 {'M': 32, 'efConstruction': 100} 0.685 1372.022
50 luceneknn dim=100 {'M': 32, 'efConstruction': 100} 0.820 680.472
100 luceneknn dim=100 {'M': 32, 'efConstruction': 100} 0.872 438.112
500 luceneknn dim=100 {'M': 32, 'efConstruction': 100} 0.960 126.510
800 luceneknn dim=100 {'M': 32, 'efConstruction': 100} 0.975 87.441
1000 luceneknn dim=100 {'M': 32, 'efConstruction': 100} 0.980 73.193
10 hnswlib ({'M': 32, 'efConstruction': 100}) 0.552 16745.433
50 hnswlib ({'M': 32, 'efConstruction': 100}) 0.794 5738.468
100 hnswlib ({'M': 32, 'efConstruction': 100}) 0.860 3336.386
500 hnswlib ({'M': 32, 'efConstruction': 100}) 0.956 832.982
800 hnswlib ({'M': 32, 'efConstruction': 100}) 0.973 541.097
1000 hnswlib ({'M': 32, 'efConstruction': 100}) 0.979 442.163
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org