You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by ju...@apache.org on 2022/10/19 21:00:55 UTC

[lucene] branch branch_9x updated: Fix Lucene94HnswVectorsFormat validation on large segments (#11861)

This is an automated email from the ASF dual-hosted git repository.

julietibs pushed a commit to branch branch_9x
in repository https://gitbox.apache.org/repos/asf/lucene.git


The following commit(s) were added to refs/heads/branch_9x by this push:
     new ca794c6ec0d Fix Lucene94HnswVectorsFormat validation on large segments (#11861)
ca794c6ec0d is described below

commit ca794c6ec0da915c037dc01d71a9da2b5072bc7b
Author: Julie Tibshirani <ju...@apache.org>
AuthorDate: Wed Oct 19 13:49:59 2022 -0700

    Fix Lucene94HnswVectorsFormat validation on large segments (#11861)
    
    When reading large segments, the vectors format can fail with a validation
    error:
    
    java.lang.IllegalStateException: Vector data length 3070061568 not matching
    size=999369 * dim=768 * byteSize=4 = -1224905728
    
    The problem is that we use an integer to represent the size, which is too small
    to hold it. The bug snuck in during the work to enable int8 values, which
    switched a long value to an int.
---
 lucene/CHANGES.txt                                                | 8 ++++++++
 .../apache/lucene/codecs/lucene94/Lucene94HnswVectorsReader.java  | 4 +++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/lucene/CHANGES.txt b/lucene/CHANGES.txt
index 9b92c75f054..82f2f597a38 100644
--- a/lucene/CHANGES.txt
+++ b/lucene/CHANGES.txt
@@ -65,6 +65,14 @@ Other
 
 * LUCENE-10635: Ensure test coverage for WANDScorer by using a test query. (Zach Chen, Adrien Grand)
 
+======================== Lucene 9.4.1 =======================
+
+Bug Fixes
+---------------------
+* GITHUB#11858: Fix kNN vectors format validation on large segments. This
+ addresses a regression in 9.4.0 where validation could fail, preventing
+ further writes or searches on the index. (Julie Tibshirani)
+
 ======================== Lucene 9.4.0 =======================
 
 API Changes
diff --git a/lucene/core/src/java/org/apache/lucene/codecs/lucene94/Lucene94HnswVectorsReader.java b/lucene/core/src/java/org/apache/lucene/codecs/lucene94/Lucene94HnswVectorsReader.java
index 1d02d818f8b..035be10e41a 100644
--- a/lucene/core/src/java/org/apache/lucene/codecs/lucene94/Lucene94HnswVectorsReader.java
+++ b/lucene/core/src/java/org/apache/lucene/codecs/lucene94/Lucene94HnswVectorsReader.java
@@ -180,7 +180,9 @@ public final class Lucene94HnswVectorsReader extends KnnVectorsReader {
         byteSize = Float.BYTES;
         break;
     }
-    int numBytes = fieldEntry.size * dimension * byteSize;
+    long vectorBytes = Math.multiplyExact((long) dimension, byteSize);
+    long numBytes = Math.multiplyExact(vectorBytes, fieldEntry.size);
+
     if (numBytes != fieldEntry.vectorDataLength) {
       throw new IllegalStateException(
           "Vector data length "