You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "jpountz (via GitHub)" <gi...@apache.org> on 2023/01/23 20:34:06 UTC

[GitHub] [lucene] jpountz commented on a diff in pull request #12102: Replace BytesRef usages in byte vectors API with byte[]

jpountz commented on code in PR #12102:
URL: https://github.com/apache/lucene/pull/12102#discussion_r1084511508


##########
lucene/core/src/java/org/apache/lucene/util/VectorUtil.java:
##########
@@ -182,12 +182,12 @@ private static float squareDistanceUnrolled(float[] v1, float[] v2, int index) {
   }
 
   /** Returns the sum of squared differences of the two vectors. */
-  public static float squareDistance(BytesRef a, BytesRef b) {
+  public static float squareDistance(byte[] a, byte[] b) {
     // Note: this will not overflow if dim < 2^18, since max(byte * byte) = 2^14.
     int squareSum = 0;
-    int aOffset = a.offset, bOffset = b.offset;
+    int aOffset = 0, bOffset = 0;
     for (int i = 0; i < a.length; i++) {
-      int diff = a.bytes[aOffset++] - b.bytes[bOffset++];
+      int diff = a[aOffset++] - b[bOffset++];

Review Comment:
   remove the aOffset and bOffset local variables and use i instead?



##########
lucene/core/src/test/org/apache/lucene/document/TestField.java:
##########
@@ -637,14 +637,17 @@ public void testKnnVectorField() throws Exception {
     try (Directory dir = newDirectory();
         IndexWriter w = new IndexWriter(dir, newIndexWriterConfig())) {
       Document doc = new Document();
-      BytesRef br = newBytesRef(new byte[5]);
-      Field field = new KnnByteVectorField("binary", br, VectorSimilarityFunction.EUCLIDEAN);
+      byte[] b = new byte[5];
+      KnnByteVectorField field =
+          new KnnByteVectorField("binary", b, VectorSimilarityFunction.EUCLIDEAN);
+      assertNull(field.binaryValue());
+      assertEquals(b, field.vectorValue());

Review Comment:
   should this be `assertArrayEquals`? Otherwise I think it's checking instance equality?



##########
lucene/core/src/test/org/apache/lucene/util/TestVectorUtil.java:
##########
@@ -161,41 +161,41 @@ public static float[] randomVector(int dim) {
     return v;
   }
 
-  private static BytesRef randomVectorBytes() {
+  private static byte[] randomVectorBytes() {
     BytesRef v = TestUtil.randomBinaryTerm(random(), TestUtil.nextInt(random(), 1, 100));
     // clip at -127 to avoid overflow
     for (int i = v.offset; i < v.offset + v.length; i++) {
       if (v.bytes[i] == -128) {
         v.bytes[i] = -127;
       }
     }
-    return v;
+    return v.bytes;

Review Comment:
   this assumes that v.offset is 0 and v.length is the array length? can you add an assert or make a copy of the sub array?



##########
lucene/core/src/test/org/apache/lucene/util/hnsw/KnnGraphTester.java:
##########
@@ -534,10 +532,10 @@ float[] next() throws IOException {
       return target;
     }
 
-    BytesRef nextBytes() throws IOException {
+    byte[] nextBytes() throws IOException {
       readNext();
       bytes.get(scratch);
-      return bytesRef;
+      return scratch;

Review Comment:
   I wonder if this class belongs in Lucene or if it should move to luceneutil



##########
lucene/core/src/java/org/apache/lucene/util/VectorUtil.java:
##########
@@ -251,12 +251,12 @@ public static void add(float[] u, float[] v) {
    * @param b bytes containing another vector, of the same dimension
    * @return the value of the dot product of the two vectors
    */
-  public static float dotProduct(BytesRef a, BytesRef b) {
+  public static float dotProduct(byte[] a, byte[] b) {
     assert a.length == b.length;
     int total = 0;
-    int aOffset = a.offset, bOffset = b.offset;
+    int aOffset = 0, bOffset = 0;
     for (int i = 0; i < a.length; i++) {
-      total += a.bytes[aOffset++] * b.bytes[bOffset++];
+      total += a[aOffset++] * b[bOffset++];

Review Comment:
   remove the aOffset and bOffset local variables and use i instead?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org