You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/03/17 19:49:30 UTC

[GitHub] [lucene] msokolov opened a new pull request #20: LUCENE-9844: document disk layout of Lucene90VectorFormat

msokolov opened a new pull request #20:
URL: https://github.com/apache/lucene/pull/20


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] msokolov commented on pull request #20: LUCENE-9844: document disk layout of Lucene90VectorFormat

Posted by GitBox <gi...@apache.org>.
msokolov commented on pull request #20:
URL: https://github.com/apache/lucene/pull/20#issuecomment-801932289


   I think your idea about eliminating the docid "map" for the dense case is exactly right. In fact, for the other (sparse) case, we can optimize further too, using PackedInts I think.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on pull request #20: LUCENE-9844: document disk layout of Lucene90VectorFormat

Posted by GitBox <gi...@apache.org>.
rmuir commented on pull request #20:
URL: https://github.com/apache/lucene/pull/20#issuecomment-801576257


   these docs were helpful already to me in confirming what I thought I understood of the code about how sparsity and such was currently handled. Looking at the docs this way is an easy way to think about what is happening, it is different from the code. It allows ppl to have ideas without going thru the code.
   
   For example I look at what you describe here, and I think there might be a simple optimization for the dense case ("fixed" schema where vectors are present for every doc): if `the number of documents having values for this field` == `maxdoc`, we can omit writing the next item (`the docids of documents having vectors, in order`) completely, save some disk space, just make the array null and save memory (4 bytes per document) and avoid Arrays.binarySearch in `advance()`. 
   
   But I will look at the code to try to really confirm that and play with it. Thanks again.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] msokolov merged pull request #20: LUCENE-9844: document disk layout of Lucene90VectorFormat

Posted by GitBox <gi...@apache.org>.
msokolov merged pull request #20:
URL: https://github.com/apache/lucene/pull/20


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org