You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Jonathan Ellis <jb...@gmail.com> on 2023/08/02 15:14:17 UTC

Mixed results with FINGER for HNSW search

Hi all,

FINGER [1] is a recent paper that describes how to accelerate graph-based
ANN search by using a faster, approximate distance measurement for most
query node / graph node comparisons instead of doing full N-dimension
vector calculations.

I've implemented a proof of concept of Finger for Lucene HNSW (in memory),
but I have not been able to replicate the paper's claims of massive
speedup.  Instead I see modest to negligible speedup.  For example, for the
nytimes-256 dataset, at (M, ef) = (16, 100) and topK=10 [2], I get

Vanilla HNSW: 0.72 s to search 10k vectors, recall = 71.5%
HNSW-FINGER: 0.70 s to search 10k vectors, recall = 68.9%

For the same dataset at (M, ef) = (128, 512), I get
Vanilla HNSW: 3.17 s to search 10k vectors, recall = 99.1%
HNSW-FINGER: 1.93 s to search 10k vectors, recall = 97.3%

The FINGER paper does not give exact M/ef, but for recall=70% they show a
speedup of 7000 ops/s -> 17000 ops/s on this dataset in Figure 4, almost
2.5x.

I've looked at this code every way I can think of and I'm stuck, I can't
see what I'm doing wrong or differently.  The approximation code is fairly
straightforward arithmetic, and profiling confirms that the
approximate-similarity function is in fact ~10x faster than the
exact-similarity function.  The problem seems to be that it still needs to
make a significant number of exact-similarity calls.  At (16, 100) it's
making about 2/3 as many exact-similarity calls as vanilla HNSW, and at
(128, 512) it's making about 1/4 as many (hence the performance
improvement). [3]  I believe this is because the approximated similarity
simply isn't high-fidelity enough.

The reason that it's hard to be sure is that there are two key components
to the FINGER approximate distance]:
1. Calculate the optimal basis to project the vectors onto, and
2. Use the signs of the vectors projected onto that basis to estimate the
angle between them.

It's very difficult to rule out bugs in either of these parts since there
are no "known good" values to compare with.  Compounding this challenge, #2
is a probabilistic result so you really have to look at averages over many
calls.

Here are some of the properties of my implementation that I have verified
behave as expected:

1. The basis vectors are orthonormal, and demonstrate that they preserve
more variance and less error than randomly chosen basis vectors
2. For trivial cases, the basis vectors match what an independent numpy
implementation derives
3. As r (the dimension of the basis vectors) increases, the error of the
estimated similarity compared to the actual similarity decreases
4. Using the actual low-basis angle instead of the hamming approximation
improves things, but (it seems to me) not so much that it indicates that
the approximation is broken.

If someone can take a look and see if I missed something, I'd appreciate
that.  Code is at https://github.com/jbellis/lucene/tree/concurrent-finger,
the highlights are

- FingerMetadata: where all the work happens
- TestFingerMetadata: verifies properties 1-3 above
- HnswGraphTestCase.testFinger*: compares Finger and vanilla results on
random data
- HnswSearcher: a cleaned-up HnswGraphSearcher with a Builder that adds
Finger support (search for useFinger)
- VectorMath: utility methods building on Apache Commons Math, which I just
copied into the source tree as hnsw.math package to get it working for now

Finally, the harness I used for testing the nytimes and other datasets is
at https://github.com/jbellis/hnswdemo/tree/finger.  (This code works with
the hdf5 datasets as provided by https://github.com/erikbern/ann-benchmarks/
.)

Much obliged for any ideas you have!

P.S. I'm also available on ASF slack #lucene-dev if you want to discuss
there.

[1] https://dl.acm.org/doi/pdf/10.1145/3543507.3583318
[2] The FINGER paper prefers to measure Recall@10 even though many datasets
offer Recall@100. I did not see a massive difference in Finger's
performance advantage either way.
[3] See Algorithm 2 line 19 in the paper for when exact similarity is
performed, corresponding to line 257 in HnswSearcher.

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced