You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/08/26 14:50:44 UTC
[GitHub] [lucene] msokolov commented on a change in pull request #262: LUCENE-10063: implement SimpleTextKnnvectorsReader.search

msokolov commented on a change in pull request #262:
URL: https://github.com/apache/lucene/pull/262#discussion_r696706087



##########
File path: lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/SimpleTextKnnVectorsReader.java
##########
@@ -140,7 +147,38 @@ public VectorValues getVectorValues(String field) throws IOException {
 
   @Override
   public TopDocs search(String field, float[] target, int k, Bits acceptDocs) throws IOException {
-    throw new UnsupportedOperationException();
+    VectorValues values = getVectorValues(field);

Review comment:
       Thanks, let's follow the convention of relying on callers to do such checking then.

##########
File path: lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/SimpleTextKnnVectorsReader.java
##########
@@ -140,7 +147,38 @@ public VectorValues getVectorValues(String field) throws IOException {
 
   @Override
   public TopDocs search(String field, float[] target, int k, Bits acceptDocs) throws IOException {
-    throw new UnsupportedOperationException();
+    VectorValues values = getVectorValues(field);
+    if (values == null) {
+      return null;
+    }
+    if (target.length != values.dimension()) {
+      throw new IllegalArgumentException(
+          "incorrect dimension for field "
+              + field
+              + "; expected "
+              + values.dimension()
+              + " but target has "
+              + target.length);
+    }
+    FieldInfo info = readState.fieldInfos.fieldInfo(field);
+    VectorSimilarityFunction vectorSimilarity = info.getVectorSimilarityFunction();
+    HitQueue topK = new HitQueue(k, false);
+    int doc;
+    while ((doc = values.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) {
+      float[] vector = values.vectorValue();
+      float score = vectorSimilarity.compare(vector, target);
+      if (vectorSimilarity.reversed) {
+        score = 1 / (score + 1);
+      }
+      topK.insertWithOverflow(new ScoreDoc(doc, score));
+    }
+    ScoreDoc[] topScoreDocs = new ScoreDoc[topK.size()];
+    int i = 0;
+    for (ScoreDoc scoreDoc : topK) {
+      topScoreDocs[i++] = scoreDoc;
+    }
+    Arrays.sort(topScoreDocs, Comparator.comparingInt(x -> x.doc));

Review comment:
       No, that's exactly right - should be sorted by score here, not by docid - I was confused having just written the Query implementation. I do see the `KnnVectorsReader` javadoc doesn't explicitly state what the contract is supposed to be; let's rectify that in a separate issue.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org