You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by ju...@apache.org on 2022/10/19 21:19:39 UTC

[lucene] branch branch_9_4 updated: Fix Lucene94HnswVectorsFormat validation on large segments (#11861)

This is an automated email from the ASF dual-hosted git repository.

julietibs pushed a commit to branch branch_9_4
in repository https://gitbox.apache.org/repos/asf/lucene.git


The following commit(s) were added to refs/heads/branch_9_4 by this push:
     new f8148fb2303 Fix Lucene94HnswVectorsFormat validation on large segments (#11861)
f8148fb2303 is described below

commit f8148fb230370d724cdbdddb2f3c910b48ff3abf
Author: Julie Tibshirani <ju...@apache.org>
AuthorDate: Wed Oct 19 13:49:59 2022 -0700

    Fix Lucene94HnswVectorsFormat validation on large segments (#11861)
    
    When reading large segments, the vectors format can fail with a validation
    error:
    
    java.lang.IllegalStateException: Vector data length 3070061568 not matching
    size=999369 * dim=768 * byteSize=4 = -1224905728
    
    The problem is that we use an integer to represent the size, which is too small
    to hold it. The bug snuck in during the work to enable int8 values, which
    switched a long value to an int.
---
 lucene/CHANGES.txt                                                | 8 ++++++++
 .../apache/lucene/codecs/lucene94/Lucene94HnswVectorsReader.java  | 4 +++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/lucene/CHANGES.txt b/lucene/CHANGES.txt
index 9ad8ba23b04..782e0662c82 100644
--- a/lucene/CHANGES.txt
+++ b/lucene/CHANGES.txt
@@ -9,6 +9,14 @@ Bug Fixes
 ---------------------
 (No changes)
 
+======================== Lucene 9.4.1 =======================
+
+Bug Fixes
+---------------------
+* GITHUB#11858: Fix kNN vectors format validation on large segments. This
+ addresses a regression in 9.4.0 where validation could fail, preventing
+ further writes or searches on the index. (Julie Tibshirani)
+
 ======================== Lucene 9.4.0 =======================
 
 API Changes
diff --git a/lucene/core/src/java/org/apache/lucene/codecs/lucene94/Lucene94HnswVectorsReader.java b/lucene/core/src/java/org/apache/lucene/codecs/lucene94/Lucene94HnswVectorsReader.java
index 1d02d818f8b..035be10e41a 100644
--- a/lucene/core/src/java/org/apache/lucene/codecs/lucene94/Lucene94HnswVectorsReader.java
+++ b/lucene/core/src/java/org/apache/lucene/codecs/lucene94/Lucene94HnswVectorsReader.java
@@ -180,7 +180,9 @@ public final class Lucene94HnswVectorsReader extends KnnVectorsReader {
         byteSize = Float.BYTES;
         break;
     }
-    int numBytes = fieldEntry.size * dimension * byteSize;
+    long vectorBytes = Math.multiplyExact((long) dimension, byteSize);
+    long numBytes = Math.multiplyExact(vectorBytes, fieldEntry.size);
+
     if (numBytes != fieldEntry.vectorDataLength) {
       throw new IllegalStateException(
           "Vector data length "