You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/05/20 20:58:36 UTC

[GitHub] [lucene] msokolov opened a new pull request, #913: Lucene 10577

msokolov opened a new pull request, #913:
URL: https://github.com/apache/lucene/pull/913

   This patch revisits the idea of storing vectors in 8-bit precision. It incorporates some @rcmuir and @jpountz suggestions from https://github.com/apache/lucene/pull/899 which was the first attempt. The highlights here are:
   
   1. Adds a new VectorSimilarity to represent this use case
   2. Expects vectors to be supplied (as float[]) with values in the range [-128,127]
   3. All dot-product operations are done over byte using integer math when this similarity is used
   
   The good parts:
   
   Index size is reduced; I ran a test with 100K (200-dimensional) GloVe vectors and the index size goes from 231M to 173M for an M=200 graph and 154M->97M for an M=100 graph. The vector portion of the storage (.vec files) is reduced nearly 75% as expected.
   
   With careful tuning of the scaling factor, recall is pretty much unchanged compared to the full-precision baseline. I think these GloVe vectors are not well-conditioned numerically, so to do the scaling I did not use min/max values as the limits, rather I discarded outlying values and used 0.1/0.99 pctile values.
   
   Indexing and search performance as measured with KnnGraphTester improved somewhere in the range of 15-25%. I'd like to run luceneutil and/or ann-benchmarks to get another look, but this was a very pleasant surprise. I think the prospect is bright for vectorization too, but this improvement is achievable using our scalar impl.
   
   The not-so-good-parts:
   
   I think we probably wouldn't want to commit as-is since it would make changes to the codec that aren't forwards-compatible. To go forward I guess we'd need to create a new version of the HnswVectorsFormat.
   
   I had to add some branches and generics to handle the two different data representations. I don't think it's too terribly ugly, but I know we've had concerns about this kind of thing in the past - maybe there's a better way that doesn't require massive code duplication?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on a diff in pull request #913: Lucene 10577

Posted by GitBox <gi...@apache.org>.
rmuir commented on code in PR #913:
URL: https://github.com/apache/lucene/pull/913#discussion_r878550812


##########
lucene/core/src/java/org/apache/lucene/util/VectorUtil.java:
##########
@@ -213,4 +213,21 @@ public static void add(float[] u, float[] v) {
       u[i] += v[i];
     }
   }
+
+  public static float dotProduct(BytesRef a, int aOffset, BytesRef b, int bOffset, int len) {
+    // fixme -- move to codec? What if later we want to access the bytes some other way?
+    int total = 0;
+    for (int i = 0; i < len; i++) {
+      total += a.bytes[aOffset++] * b.bytes[bOffset++];

Review Comment:
   Is this really the way the function should work?
   
   it is doing the equivalent of casting both `a` and `b` to `int` first, and then adding result to another `int`. That's just how java promotion works.
   
   But is this what we want? What about overflow? why `int` and not `long`? 
   
   This is not at all what `ByteVector.mul()` does.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] msokolov commented on a diff in pull request #913: Lucene 10577

Posted by GitBox <gi...@apache.org>.
msokolov commented on code in PR #913:
URL: https://github.com/apache/lucene/pull/913#discussion_r878557225


##########
lucene/core/src/java/org/apache/lucene/util/VectorUtil.java:
##########
@@ -213,4 +213,21 @@ public static void add(float[] u, float[] v) {
       u[i] += v[i];
     }
   }
+
+  public static float dotProduct(BytesRef a, int aOffset, BytesRef b, int bOffset, int len) {
+    // fixme -- move to codec? What if later we want to access the bytes some other way?
+    int total = 0;
+    for (int i = 0; i < len; i++) {
+      total += a.bytes[aOffset++] * b.bytes[bOffset++];

Review Comment:
   yeah, I see that now. Well ... this *is* what we want it to do. I mean we could potentially reduce the precision still further (to ~7~ umm 4! bits) and then each pair of bytes could multiplied safely. But we do still need to add them up, too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on a diff in pull request #913: Lucene 10577

Posted by GitBox <gi...@apache.org>.
rmuir commented on code in PR #913:
URL: https://github.com/apache/lucene/pull/913#discussion_r878555057


##########
lucene/core/src/java/org/apache/lucene/util/VectorUtil.java:
##########
@@ -213,4 +213,21 @@ public static void add(float[] u, float[] v) {
       u[i] += v[i];
     }
   }
+
+  public static float dotProduct(BytesRef a, int aOffset, BytesRef b, int bOffset, int len) {
+    // fixme -- move to codec? What if later we want to access the bytes some other way?
+    int total = 0;
+    for (int i = 0; i < len; i++) {
+      total += a.bytes[aOffset++] * b.bytes[bOffset++];

Review Comment:
   well its going to multiply each `a` byte * each `b` byte and store each result in `c` byte. So you will overflow right there, before you even do any sum.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] msokolov commented on pull request #913: Lucene 10577

Posted by GitBox <gi...@apache.org>.
msokolov commented on PR #913:
URL: https://github.com/apache/lucene/pull/913#issuecomment-1133402474

   hm this "luke-can-be-launched" test failed for me locally too. I guess it needs a JVM with a head?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on a diff in pull request #913: Lucene 10577

Posted by GitBox <gi...@apache.org>.
rmuir commented on code in PR #913:
URL: https://github.com/apache/lucene/pull/913#discussion_r878562615


##########
lucene/core/src/java/org/apache/lucene/util/VectorUtil.java:
##########
@@ -213,4 +213,21 @@ public static void add(float[] u, float[] v) {
       u[i] += v[i];
     }
   }
+
+  public static float dotProduct(BytesRef a, int aOffset, BytesRef b, int bOffset, int len) {
+    // fixme -- move to codec? What if later we want to access the bytes some other way?
+    int total = 0;
+    for (int i = 0; i < len; i++) {
+      total += a.bytes[aOffset++] * b.bytes[bOffset++];

Review Comment:
   I think our result vector could be ShortVector and to try to use conversion (in some way that isnt slow) such as https://docs.oracle.com/en/java/javase/16/docs/api/jdk.incubator.vector/jdk/incubator/vector/VectorOperators.html#B2S in a way that is efficient? 
   
   for the adding-up part, with the floats we did reduceLanes(), so I think we'd have to use same trick with conversion of result to IntVector/LongVector to create the sum.
   
   There are gigantic paragraphs on Vector.java javadoc on how to do these conversions (maybe they are actually efficient?), but I haven't tried.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] msokolov commented on a diff in pull request #913: Lucene 10577

Posted by GitBox <gi...@apache.org>.
msokolov commented on code in PR #913:
URL: https://github.com/apache/lucene/pull/913#discussion_r878686983


##########
lucene/core/src/java/org/apache/lucene/util/VectorUtil.java:
##########
@@ -213,4 +213,21 @@ public static void add(float[] u, float[] v) {
       u[i] += v[i];
     }
   }
+
+  public static float dotProduct(BytesRef a, int aOffset, BytesRef b, int bOffset, int len) {
+    // fixme -- move to codec? What if later we want to access the bytes some other way?
+    int total = 0;
+    for (int i = 0; i < len; i++) {
+      total += a.bytes[aOffset++] * b.bytes[bOffset++];

Review Comment:
   > i wonder if we can coerce autovectorization of this loop somehow too.
   
   That would be cool. Continuing down the Vector track, I ran some JMH benchmarks. I don't want to get too deep into it here, but it does seem that `Vector<Short>.castShape()` followed by `reduceLanesToLong` produces the correct result, is quite a bit faster than the equivalent operation over `FloatVector`, and is nearly as fast as adding up and accumulating Bytes (which produces the wrong result).`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mocobeta commented on pull request #913: Lucene 10577

Posted by GitBox <gi...@apache.org>.
mocobeta commented on PR #913:
URL: https://github.com/apache/lucene/pull/913#issuecomment-1133536049

   > Looks like some other issue blocking the forked process.
   
   Yes I think it could completely hang for some reason.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mocobeta commented on pull request #913: Lucene 10577

Posted by GitBox <gi...@apache.org>.
mocobeta commented on PR #913:
URL: https://github.com/apache/lucene/pull/913#issuecomment-1133532803

   Also, it should run on headless JVM; we are running it on CI servers without displays.
   https://github.com/apache/lucene/blob/71a9acb2e2aa55257021eefce1e5d8d390bc7048/lucene/luke/src/java/org/apache/lucene/luke/app/desktop/LukeMain.java#L75-L78


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] msokolov commented on a diff in pull request #913: Lucene 10577

Posted by GitBox <gi...@apache.org>.
msokolov commented on code in PR #913:
URL: https://github.com/apache/lucene/pull/913#discussion_r878557225


##########
lucene/core/src/java/org/apache/lucene/util/VectorUtil.java:
##########
@@ -213,4 +213,21 @@ public static void add(float[] u, float[] v) {
       u[i] += v[i];
     }
   }
+
+  public static float dotProduct(BytesRef a, int aOffset, BytesRef b, int bOffset, int len) {
+    // fixme -- move to codec? What if later we want to access the bytes some other way?
+    int total = 0;
+    for (int i = 0; i < len; i++) {
+      total += a.bytes[aOffset++] * b.bytes[bOffset++];

Review Comment:
   yeah, I see that now. Well ... this *is* what we want it to do. I mean we could potentially reduce the precision still further (to 7 bits) and then each pair of bytes could multiplied safely. But we do still need to add them up, too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] msokolov commented on a diff in pull request #913: Lucene 10577

Posted by GitBox <gi...@apache.org>.
msokolov commented on code in PR #913:
URL: https://github.com/apache/lucene/pull/913#discussion_r878580201


##########
lucene/core/src/java/org/apache/lucene/util/VectorUtil.java:
##########
@@ -213,4 +213,21 @@ public static void add(float[] u, float[] v) {
       u[i] += v[i];
     }
   }
+
+  public static float dotProduct(BytesRef a, int aOffset, BytesRef b, int bOffset, int len) {
+    // fixme -- move to codec? What if later we want to access the bytes some other way?
+    int total = 0;
+    for (int i = 0; i < len; i++) {
+      total += a.bytes[aOffset++] * b.bytes[bOffset++];

Review Comment:
   I will give castShape/convert/convertShape a try. I also found reduceLanesToLong that seems promising



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mocobeta commented on pull request #913: Lucene 10577

Posted by GitBox <gi...@apache.org>.
mocobeta commented on PR #913:
URL: https://github.com/apache/lucene/pull/913#issuecomment-1133529000

   > hm this "luke-can-be-launched" test failed for me locally too. I guess it needs a JVM with a head?
   
   It's not a part of the mandatory test suite but you can run it headlessly (if you are on Linux). Please see: https://github.com/apache/lucene/blob/71a9acb2e2aa55257021eefce1e5d8d390bc7048/help/tests.txt#L131
   
   As for the occasional test failure of the GUI test for Luke, I'm planning to disable the test on windows VM - it looks too slow in GitHub Actions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] dweiss commented on pull request #913: Lucene 10577

Posted by GitBox <gi...@apache.org>.
dweiss commented on PR #913:
URL: https://github.com/apache/lucene/pull/913#issuecomment-1133533455

   > I'm planning to disable the test on windows VM - it looks too slow in GitHub Actions.
   
   I'm not sure it's the slowness anymore. Looks like some other issue blocking the forked process.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] msokolov commented on pull request #913: Lucene 10577

Posted by GitBox <gi...@apache.org>.
msokolov commented on PR #913:
URL: https://github.com/apache/lucene/pull/913#issuecomment-1133979777

   I ran luceneutil (after adding support for this new encoding, and scaling the vectors used on the build the candidate index and queries) and got decent results:
   
          Task QPS  | baseline |      StdDev | candidate  |      StdDev |                Pct diff | p-value
    --------------- | -------: | ----------- | ---------: | ----------- | ----------------------- | --------- :
     PKLookup       |   133.47 |     (18.7%) |     130.73 |     (24.6%) |  -2.0% ( -38% -   50%)  | 0.767
    LowTermVector   |  1024.29 |     (15.6%) |    1140.88 |     (16.3%) |   11.4% ( -17% -   51%) | 0.024
    MedTermVector   |   727.57 |   (8.6%)    |     850.54 |     (14.0%) |  16.9% (  -5% -   43%)  | 0.000
   AndHighMedVector |   934.28 |    (8.9%)   |    1105.48 |     (15.0%) |   18.3% (  -5% -   46%) | 0.000
   AndHighHighVector|   789.04 |   (11.1%)   |     947.74 |     (16.1%) |   20.1% (  -6% -   53%) | 0.000
   AndHighLowVector |   910.39 |   (13.2%)   |    1219.95 |     (22.2%) |   34.0% (  -1% -   79%) | 0.000
    HighTermVector  |   674.78 |    (8.8%)   |     915.99 |     (17.9%) |   35.7% (   8% -   68%) | 0.000
   
   I have no explanation for the drop in PKLookup -- I checked the two indexes and they have the same named files, all the same sizes except for the .vec and .vem files, as expected.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] msokolov commented on pull request #913: Lucene 10577

Posted by GitBox <gi...@apache.org>.
msokolov commented on PR #913:
URL: https://github.com/apache/lucene/pull/913#issuecomment-1133605270

   > It's not a part of the mandatory test suite but you can run it headlessly with a virtual display (if you are on Linux). Please see: 
   Thanks, I guess it runs as part of `check` tasks? Anyway after installing xvfb I get a pass; thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] msokolov closed pull request #913: Lucene 10577

Posted by GitBox <gi...@apache.org>.
msokolov closed pull request #913: Lucene 10577
URL: https://github.com/apache/lucene/pull/913


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on a diff in pull request #913: Lucene 10577

Posted by GitBox <gi...@apache.org>.
rmuir commented on code in PR #913:
URL: https://github.com/apache/lucene/pull/913#discussion_r878641687


##########
lucene/core/src/java/org/apache/lucene/util/VectorUtil.java:
##########
@@ -213,4 +213,21 @@ public static void add(float[] u, float[] v) {
       u[i] += v[i];
     }
   }
+
+  public static float dotProduct(BytesRef a, int aOffset, BytesRef b, int bOffset, int len) {
+    // fixme -- move to codec? What if later we want to access the bytes some other way?
+    int total = 0;
+    for (int i = 0; i < len; i++) {
+      total += a.bytes[aOffset++] * b.bytes[bOffset++];

Review Comment:
   i wonder if we can coerce autovectorization of this loop somehow too. it is also worth looking into as it wouldn't require any vector api at all, maybe just manipulating the code. it is integer math: doesn't have the restrictions that floating point does (order of operations etc), so theoretically the compiler could do it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] msokolov commented on a diff in pull request #913: Lucene 10577

Posted by GitBox <gi...@apache.org>.
msokolov commented on code in PR #913:
URL: https://github.com/apache/lucene/pull/913#discussion_r878686983


##########
lucene/core/src/java/org/apache/lucene/util/VectorUtil.java:
##########
@@ -213,4 +213,21 @@ public static void add(float[] u, float[] v) {
       u[i] += v[i];
     }
   }
+
+  public static float dotProduct(BytesRef a, int aOffset, BytesRef b, int bOffset, int len) {
+    // fixme -- move to codec? What if later we want to access the bytes some other way?
+    int total = 0;
+    for (int i = 0; i < len; i++) {
+      total += a.bytes[aOffset++] * b.bytes[bOffset++];

Review Comment:
   > i wonder if we can coerce autovectorization of this loop somehow too.
   That would be cool. Continuing down the Vector track, I ran some JMH benchmarks. I don't want to get too deep into it here, but it does seem that `Vector<Short>.castShape()` followed by `reduceLanesToLong` produces the correct result, is quite a bit faster than the equivalent operation over `FloatVector`, and is nearly as fast as adding up and accumulating Bytes (which produces the wrong result).`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] msokolov commented on a diff in pull request #913: Lucene 10577

Posted by GitBox <gi...@apache.org>.
msokolov commented on code in PR #913:
URL: https://github.com/apache/lucene/pull/913#discussion_r878553114


##########
lucene/core/src/java/org/apache/lucene/util/VectorUtil.java:
##########
@@ -213,4 +213,21 @@ public static void add(float[] u, float[] v) {
       u[i] += v[i];
     }
   }
+
+  public static float dotProduct(BytesRef a, int aOffset, BytesRef b, int bOffset, int len) {
+    // fixme -- move to codec? What if later we want to access the bytes some other way?
+    int total = 0;
+    for (int i = 0; i < len; i++) {
+      total += a.bytes[aOffset++] * b.bytes[bOffset++];

Review Comment:
   hm, well in the worst case (128 * 128) = 2^7*2^7 = 2^14 and we can sum lots of these in an int without any concern about overflow. I think the limit is on the dimension of the vector; I mean 2^31/2^14 = 2^17 and I don't think we would ever have a 2^17-dimensional vector. Perhaps this gives us a principled way to choose that max dimension :) I guess 2^17 = 128K.
   
   Looking at what `ByteVector.mul()` does ...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org