You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/05/19 13:53:35 UTC
[GitHub] [lucene] msokolov commented on a diff in pull request #899: Lucene 10577

msokolov commented on code in PR #899:
URL: https://github.com/apache/lucene/pull/899#discussion_r877085508


##########
lucene/core/src/java/org/apache/lucene/codecs/lucene92/ExpandingRandomAccessVectorValues.java:
##########
@@ -0,0 +1,57 @@
+package org.apache.lucene.codecs.lucene92;
+
+import org.apache.lucene.index.RandomAccessVectorValues;
+import org.apache.lucene.index.RandomAccessVectorValuesProducer;
+import org.apache.lucene.util.BytesRef;
+
+import java.io.IOException;
+
+public class ExpandingRandomAccessVectorValues implements RandomAccessVectorValuesProducer {
+
+  private final RandomAccessVectorValuesProducer delegate;
+  private final float scale;
+
+  /**
+   * Wraps an existing vector values producer. Floating point vector values will be produced by scaling
+   * byte-quantized values read from the values produced by the input.
+   */
+  protected ExpandingRandomAccessVectorValues(RandomAccessVectorValuesProducer in, float scale) {
+    this.delegate = in;
+    assert scale != 0;
+    this.scale = scale;
+  }
+
+  @Override
+  public RandomAccessVectorValues randomAccess() throws IOException {
+    RandomAccessVectorValues delegateValues = delegate.randomAccess();
+    float[] value  = new float[delegateValues.dimension()];;
+
+    return new RandomAccessVectorValues() {
+
+      @Override
+      public int size() {
+        return delegateValues.size();
+      }
+
+      @Override
+      public int dimension() {
+        return delegateValues.dimension();
+      }
+
+      @Override
+      public float[] vectorValue(int targetOrd) throws IOException {
+        BytesRef binaryValue = delegateValues.binaryValue(targetOrd);
+        byte[] bytes = binaryValue.bytes;
+        for (int i = 0, j = binaryValue.offset; i < value.length; i++, j++) {
+          value[i] = bytes[j] * scale;

Review Comment:
   Well, there are definitely byte-oriented vectors. I don't think we should try to use some kind of 8-bit floating point (what would that do, have an exponent and a mantissa?) rather if we can scale to use the entire 8 bits as significant bits (no exponent) then we maximize the precision and can use the native instruction set, which ByteVector does seem to be accessing (based on some preliminary JMH I ran it seems quite a bit faster than 32-bit FloatVector).
   
   For now I see two directions: (1) figure out how to scale vectors and store as signed bytes at full precision (this issue), and (2) work up performant Vector API implementations of that scaling and the dot product in anticipation of the vector API hatching along the lines of the patch you already did, @rcmuir. Maybe we can push that to a branch that builds with `--add-modules jdk.incubator.vector` for posterity and ease of testing. I don't think we need to block (1) for (2) since it seems clear there will be a path to vectorization. In the meantime we can implement a byte-at-a-time dot product to replace the float-at-a-time dot product?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org