You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/02/08 23:47:51 UTC

[GitHub] [lucene] mayya-sharipova commented on pull request #649: LUCENE-10408 Better encoding of doc Ids in vectors

mayya-sharipova commented on pull request #649:
URL: https://github.com/apache/lucene/pull/649#issuecomment-1033175828


   @jtibshirani @jpountz Thanks for your feedback. I've tried to address in 6bf1aea543ddb3d19909a5bc3d9ccbb1b4fcf9e4.  I've decided to focus this PR only on optimizing the dense case and keeping the sparse case as was before – uncompressed way.
   
   > I wonder if it would be a better trade-off to keep ints uncompressed, but read them from disk directly instead of loading giant arrays in memory? Or possibly switch to something like DirectMonotonicReader if it doesn't slow down searches.
   
   @jpountz  Thank you for the suggestion, Adrien.  I've put this as TODO in the code to explore.  I am also wondering since we use binarySearch on docIds array, would it still be acceptable to have this array  on disk?  Do we have a precedent for such a case?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org