You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2020/05/27 09:58:54 UTC

[GitHub] [lucene-solr] iverase opened a new pull request #1538: LUCENE-9368: Use readLELongs to read docIds on BKD leaf nodes

iverase opened a new pull request #1538:
URL: https://github.com/apache/lucene-solr/pull/1538


   This change changes the way we read docIds from the index using readLELongs so we can read most of the docs in one batch into a temporary array. This allows the compiler to run more efficient loops as well as we are copying information between two arrays.
   
   In addition we add two need encoding paths for int8 and int16.
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] iverase commented on pull request #1538: LUCENE-9368: Use readLELongs to read docIds on BKD leaf nodes

Posted by GitBox <gi...@apache.org>.

iverase commented on pull request #1538:
URL: https://github.com/apache/lucene-solr/pull/1538#issuecomment-1030029995


   closing it as there has been a better proposal.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] iverase commented on pull request #1538: LUCENE-9368: Use readLELongs to read docIds on BKD leaf nodes

Posted by GitBox <gi...@apache.org>.

iverase commented on pull request #1538:
URL: https://github.com/apache/lucene-solr/pull/1538#issuecomment-635373369


   >>> Should we increment the version number in BKDWriter to be able to remove the call to reverseBytes at search time?
   
   We can do that or add different code numbers. We can leave the readers for backwards compatibility?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] jpountz commented on pull request #1538: LUCENE-9368: Use readLELongs to read docIds on BKD leaf nodes

Posted by GitBox <gi...@apache.org>.

jpountz commented on pull request #1538:
URL: https://github.com/apache/lucene-solr/pull/1538#issuecomment-683344047


   I like that second idea better because I think it's more likely applicable. The issue with the first one is that it only works if you are indexing from a single thread, if you are indexing from multiple threads then the doc IDs will no longer be sorted after segments get merged.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] jpountz commented on pull request #1538: LUCENE-9368: Use readLELongs to read docIds on BKD leaf nodes

Posted by GitBox <gi...@apache.org>.

jpountz commented on pull request #1538:
URL: https://github.com/apache/lucene-solr/pull/1538#issuecomment-635385634


   > Adding a special case when docIds are just incrementing by one as well
   
   I can't think of many cases when that would kick in, this would only be used for single-valued fields that are used for index sorting?
   
   > Yes, that was my idea as a follow up and the reason I added the 8 /16 bits so we can use it later
   
   Do you mean something like stealing a bit of the byte we use to store the number of bits per value to record whether doc IDs are offseted by some constant? If so, then maybe we should do it now so 8/16 bits per value would have a chance of getting used on real data?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] mikemccand commented on pull request #1538: LUCENE-9368: Use readLELongs to read docIds on BKD leaf nodes

Posted by GitBox <gi...@apache.org>.

mikemccand commented on pull request #1538:
URL: https://github.com/apache/lucene-solr/pull/1538#issuecomment-634749641


   This looks promising @iverase!  Do you have benchmark results for this approach?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] jpountz commented on a change in pull request #1538: LUCENE-9368: Use readLELongs to read docIds on BKD leaf nodes

Posted by GitBox <gi...@apache.org>.

jpountz commented on a change in pull request #1538:
URL: https://github.com/apache/lucene-solr/pull/1538#discussion_r431835951



##########
File path: lucene/core/src/java/org/apache/lucene/util/bkd/BKDReader.java
##########
@@ -498,6 +500,7 @@ public IntersectState(IndexInput in, int numDims,
       this.scratchMinIndexPackedValue = new byte[packedIndexBytesLength];
       this.scratchMaxIndexPackedValue = new byte[packedIndexBytesLength];
       this.index = indexVisitor;
+      this.scratchLongs = new long[maxPointsInLeafNode / 2];

Review comment:
       does it need to be 
   ```suggestion
         this.scratchLongs = new long[(maxPointsInLeafNode + 1) / 2];
   ```
   so that it keeps working for even numbers of points per node?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] iverase commented on pull request #1538: LUCENE-9368: Use readLELongs to read docIds on BKD leaf nodes

Posted by GitBox <gi...@apache.org>.

iverase commented on pull request #1538:
URL: https://github.com/apache/lucene-solr/pull/1538#issuecomment-635370073


   Yes, that was my idea as a follow up and the reason I added the 8 /16 bits so we can use it later on instead of delta decoding. Adding a special case when docIds are just incrementing by one as well (we are doing that for postings) so base-time index or even counter can greatly benefit.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] iverase edited a comment on pull request #1538: LUCENE-9368: Use readLELongs to read docIds on BKD leaf nodes

Posted by GitBox <gi...@apache.org>.

iverase edited a comment on pull request #1538:
URL: https://github.com/apache/lucene-solr/pull/1538#issuecomment-635395493


   I did something naive that is to add in an index documents with an increasing integer in a loop. Then you kick in that optimisation. I thought if you are adding documents which have a timestamp on them and you are adding them chronologically, they can benefit of such optimisation. 
   
   The other idea is to remove `vInt` totally and record the lower value of the sorted array. Then you can just index the difference between the values of the array which gives more chances to have very low values. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] iverase commented on a change in pull request #1538: LUCENE-9368: Use readLELongs to read docIds on BKD leaf nodes

Posted by GitBox <gi...@apache.org>.

iverase commented on a change in pull request #1538:
URL: https://github.com/apache/lucene-solr/pull/1538#discussion_r431855701



##########
File path: lucene/core/src/java/org/apache/lucene/util/bkd/BKDReader.java
##########
@@ -498,6 +500,7 @@ public IntersectState(IndexInput in, int numDims,
       this.scratchMinIndexPackedValue = new byte[packedIndexBytesLength];
       this.scratchMaxIndexPackedValue = new byte[packedIndexBytesLength];
       this.index = indexVisitor;
+      this.scratchLongs = new long[maxPointsInLeafNode / 2];

Review comment:
       Not really. We read as many longs as we can so if the count is un-even for 32bits integer, the last four bytes are read from the index. See Line 120 of DocIdsWriter.
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] iverase commented on pull request #1538: LUCENE-9368: Use readLELongs to read docIds on BKD leaf nodes

Posted by GitBox <gi...@apache.org>.

iverase commented on pull request #1538:
URL: https://github.com/apache/lucene-solr/pull/1538#issuecomment-635395493


   I did something naive that is to add in an index documents with an increasing integer in a loop. Then you kick in that optimisation. I thought if you are adding documents which have a timestamp on them and you are adding them chronologically, they can benefit of such optimisation. 
   
   The other idea is to remove `vInt` totally and record the lower value of the sorted array. Then you can just index the difference between the values of the array which gives more changes to have very low values. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene-solr] iverase closed pull request #1538: LUCENE-9368: Use readLELongs to read docIds on BKD leaf nodes

Posted by GitBox <gi...@apache.org>.

iverase closed pull request #1538:
URL: https://github.com/apache/lucene-solr/pull/1538


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org