You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@lucene.apache.org by "benwtrent (via GitHub)" <gi...@apache.org> on 2024/02/06 14:11:12 UTC

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

benwtrent commented on issue #12615:
URL: https://github.com/apache/lucene/issues/12615#issuecomment-1929761190

   So, I did some of my own experiments. I tested Vamana (vectors in-graph) & HNSW, both with `int8` quantization (here is my Lucene branch: https://github.com/apache/lucene/compare/main...benwtrent:lucene:feature/diskann-exp & lucene-util branch: https://github.com/mikemccand/luceneutil/compare/master...benwtrent:luceneutil:feature/vamana-testing) 
   
   In low memory environments, HNSW performed better (confirming the results here: https://github.com/apache/lucene/issues/12615#issuecomment-1806615864). When the vectors are in the graph for Vamana, there were many more page faults (NOTE: I was not using PQ, but trying an apples-to-apples comparison of HNSW & vamana in the same conditions).
   
   Additionally, looking at previous results (https://github.com/apache/lucene/issues/12615#issuecomment-1868095892):
   
   > vectors: out-of-graph, rerank: sequential | 46.9 ms latency | 170 qps
   
   This indicates that there is very little benefit to Vamana. For DiskANN, one of the bragged benefits is being able to "get raw vectors for free" with disk-read-ahead when searching the in memory graph (PQ). If reranking with PQ'd search with vectors outside of the graph performs almost as well (without io_uring), it stands to reason that HNSW with PQ would do just as good. And with a better/smarter PQ implementation, less reranking may be necessary (combining with OPQ or something).
   
   I don't see any stand-out evidence that Vamana has a significant advantage over HNSW when it comes to being a graph based vector index.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org