You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2019/11/30 20:27:30 UTC

[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #1043: LUCENE-9071: Speed up BM25 scores.

bruno-roustant commented on a change in pull request #1043: LUCENE-9071: Speed up BM25 scores.
URL: https://github.com/apache/lucene-solr/pull/1043#discussion_r352157819
 
 

 ##########
 File path: lucene/core/src/java/org/apache/lucene/search/similarities/BM25Similarity.java
 ##########
 @@ -221,8 +251,8 @@ public final SimScorer scorer(float boost, CollectionStatistics collectionStats,
 
     @Override
     public float score(float freq, long encodedNorm) {
-      double norm = cache[((byte) encodedNorm) & 0xFF];
-      return weight * (float) (freq / (freq + norm));
+      float norm = cache[((byte) encodedNorm) & 0xFF];
+      return weight * tf(freq, norm);
 
 Review comment:
   As I understand, this is the line that optimizes. Indeed casts to double and then to float cost. I'm surprised that it matters on the overall query throughput. It is in the order of a couple of ns, so the impact is visible for lots of scores (millions), yes.
   If freq, norm and score were double, then we wouldn't have casts and the speed would be the good one on 64 bits machines and we wouldn't need this optimization. Could this API be changed in 9.x to have double instead of float?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org