You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/08/20 12:38:25 UTC

[GitHub] [lucene] msokolov commented on a diff in pull request #1076: Add safety checks to KnnVectorField; fixed issue with copying BytesRef

msokolov commented on code in PR #1076:
URL: https://github.com/apache/lucene/pull/1076#discussion_r950691134


##########
lucene/core/src/java/org/apache/lucene/index/VectorEncoding.java:
##########
@@ -21,12 +21,8 @@
 public enum VectorEncoding {
 
   /**
-   * Encodes vector using 8 bits of precision per sample. Use only with DOT_PRODUCT similarity.
-   * NOTE: this can enable significant storage savings and faster searches, at the cost of some
-   * possible loss of precision. In order to use it, all vectors must be of the same norm, as
-   * measured by the sum of the squares of the scalar values, and those values must be in the range
-   * [-128, 127]. This applies to both document and query vectors. Using nonconforming vectors can
-   * result in errors or poor search results.
+   * Encodes vector using 8 bits of precision per sample. NOTE: this can enable significant storage

Review Comment:
   I added back a comment about the range requirement, and also added the safety checks to `toBytesRef` -- you convinced me it's too dangerous otherwise, and in any case we only do it a few times per query (per segment). Maybe later we can move to codec so we only do once per query.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org