You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Mayya Sharipova (Jira)" <ji...@apache.org> on 2022/02/24 08:49:00 UTC

[jira] [Assigned] (LUCENE-10194) Should IndexWriter buffer KNN vectors on disk?

     [ https://issues.apache.org/jira/browse/LUCENE-10194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mayya Sharipova reassigned LUCENE-10194:
----------------------------------------

    Assignee: Mayya Sharipova

> Should IndexWriter buffer KNN vectors on disk?
> ----------------------------------------------
>
>                 Key: LUCENE-10194
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10194
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Mayya Sharipova
>            Priority: Minor
>
> VectorValuesWriter buffers data in memory, like we do for all data structures that are computed on flush. But I wonder if this is the right trade-off.
> The use-case I have in mind is someone trying to load a dataset of vectors in Lucene. Given that HNSW graphs are super expensive to create, we'd ideally load that dataset into a single segment rather than many small segments that then need to be merged together, which in-turn re-creates the HNSW graph.
> Yet buffering vectors in memory is expensive. For instance assuming 256 dimensions, each vector consumes 1kB of memory. Should we consider buffering vectors on disk to reduce chances of having to create new segments only because the RAM buffer is full?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org