You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@lucene.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2022/02/04 00:01:00 UTC

[jira] [Commented] (LUCENE-10391) Reuse data structures across HnswGraph invocations

    [ https://issues.apache.org/jira/browse/LUCENE-10391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17486752#comment-17486752 ] 

ASF subversion and git services commented on LUCENE-10391:
----------------------------------------------------------

Commit 57d9515effb16c5e904bd52e325e70ffdb4135c4 in lucene's branch refs/heads/main from Julie Tibshirani
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=57d9515 ]

LUCENE-10391: Reuse data structures across HnswGraph#searchLevel calls (#641)

A couple of the data structures used in HNSW search are pretty large and
expensive to allocate. This commit creates a shared candidates queue and
visited set that are reused across calls to HnswGraph#searchLevel. Now the same
data structures are used for building the entire graph, which can cut down on
allocations during indexing. For graph building it also switches the visited
set to FixedBitSet for better performance.

> Reuse data structures across HnswGraph invocations
> --------------------------------------------------
>
>                 Key: LUCENE-10391
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10391
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Adrien Grand
>            Assignee: Julie Tibshirani
>            Priority: Minor
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Creating HNSW graphs involves doing many repeated calls to HnswGraph#search. Profiles from nightly benchmarks suggest that allocating data-structures incurs both lots of heap allocations ([http://people.apache.org/~mikemccand/lucenebench/2022.01.23.18.03.17.html#profiler_1kb_indexing_vectors_4_heap)] and CPU usage ([http://people.apache.org/~mikemccand/lucenebench/2022.01.23.18.03.17.html#profiler_1kb_indexing_vectors_4_cpu).] It looks like reusing data structures across invocations would be a low-hanging fruit that could help save significant CPU?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org