You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/09/04 20:57:59 UTC

[GitHub] [lucene] mayya-sharipova commented on pull request #11743: LUCENE-10592 Better estimate memory for HNSW graph

mayya-sharipova commented on PR #11743:
URL: https://github.com/apache/lucene/pull/11743#issuecomment-1236414077

   @msokolov Thanks for your feedback.
   
   I have done a similar analysis comparing 9.3 branch with this change on `glove-100-angular` 1 million documents,  M:16 efConstruction:100
   
   **Results with 9.3:**
   
   IndexingChaing::ramBytesUsed() reports  497089200 bytes or **497MB**
   
   **Results with current change**
   
   IndexingChaing::ramBytesUsed() reports  memory vectors: 497075904; memory graph: 379073392; so total: **876 Mb**
   
   
   So, you are right, much more memory used during indexing. We need at least 16 * M * number_nodes:
   -  2 *M neighbours for each node on the lowest level * 8 bytes ( 4 bytes for neighbour node number + 4 bytes for neighbour score)
   - so indeed, if indexing memory buffer is set up less than that, we would end up with much more segments which is not desirable.
   
   
   ----
   > I wonder if we should consider rolling back the "build graph during indexing" change? It seems to make indexing take > 10% longer and of course requires more RAM, which will tend to make more and smaller segments; not a desirable outcome.
   
   Thanks for suggestion.  I will discuss this with our team on Tuesday, and will get back to you.
   
   One thing I wonder we did not observe longer total indexing time (combined indexing + refresh time). Is combined total indexing time + refresh time became larger for you?
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org