You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/09/05 12:51:32 UTC

[GitHub] [lucene] jpountz commented on pull request #11743: LUCENE-10592 Better estimate memory for HNSW graph

jpountz commented on PR #11743:
URL: https://github.com/apache/lucene/pull/11743#issuecomment-1236975827

   I like the idea of exploring a combination of the current approach and on-disk buffering to flush less often.
   
   For the record, the approach of building the graph at flush time has a few other downsides that are not well captured by an indexing benchmark. Mike mentioned the fact that we use a similar amount of memory at flush time (though it's more transient), but there is also the logic we have for stalling that waits until flush segments + buffered segments use 2x the size of the RAM buffer before stalling indexing. https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/DocumentsWriterFlushControl.java#L286-L294 Because flushes take very long times when building the graph on search, it's more likely that IndexWriter goes over (up to 2x) the amount of RAM that it's allowed to spend on the indexing buffer (which could be surprising on its own to users, could cause OOMEs) and indexing gets stalled (which can be surprising to users as well). Maybe getting rid of this downside is worth losing a bit of indexing throughput.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org