You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2020/11/30 22:06:13 UTC

[GitHub] [lucene-solr] msokolov opened a new pull request #2108: LUCENE-9626 represent HNSW graph neighbors using primitive arrays

msokolov opened a new pull request #2108:
URL: https://github.com/apache/lucene-solr/pull/2108


   The subject line is the main thrust, but there are a few subsidiary changes that were needed in order to achieve that (see below), and I made a few incidental improvements to HNSW-related classes.
   
   1. Add a primitive int-valued heap, IntHeap, based on the existing PriorityQueue
   2. Convert scores to 16-bit precision in order to pack them into an int along with a neighbor ordinal
   
   The result was about a 16% improvement in indexing times, and fewer, smaller GC pauses noted in profiler.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] msokolov edited a comment on pull request #2108: LUCENE-9626 represent HNSW graph neighbors using primitive arrays

Posted by GitBox <gi...@apache.org>.
msokolov edited a comment on pull request #2108:
URL: https://github.com/apache/lucene-solr/pull/2108#issuecomment-741790004


   I found some problems with the first implementation posted here, which used a tricky data structure combining parallel arrays with a separate sortable key held in an (int) priority queue. It had bugs, and I realized was wasteful too. This replaces the int queue with a long queue, where the parallel arrays (float score + int node id) are folded together into a single (sortable by score) long. I measured 20% index time improvement (over the baseline, not the previous impl here).
   
   My plan is to push as soon as we have an indexing time datapoint available in lucene nightly benchmarks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] msokolov merged pull request #2108: LUCENE-9626 represent HNSW graph neighbors using primitive arrays

Posted by GitBox <gi...@apache.org>.
msokolov merged pull request #2108:
URL: https://github.com/apache/lucene-solr/pull/2108


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] msokolov commented on pull request #2108: LUCENE-9626 represent HNSW graph neighbors using primitive arrays

Posted by GitBox <gi...@apache.org>.
msokolov commented on pull request #2108:
URL: https://github.com/apache/lucene-solr/pull/2108#issuecomment-741790004


   I found some problems with the first implementation posted here, which used a tricky data structure combining parallel arrays with a separate sortable key held in an (int) priority queue. It had bugs, and I realized was wasteful too. This replaces the int queue with a long queue, where the parallel arrays (float score + int node id) are folded together into a single (sortable by score) long. I measured 20% index time improvement (over the baseline, not the previous impl here).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org