You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/08/16 15:47:50 UTC

[GitHub] [lucene] jtibshirani commented on a change in pull request #241: LUCENE-10016: Added KnnVector index/query support to demo

jtibshirani commented on a change in pull request #241:
URL: https://github.com/apache/lucene/pull/241#discussion_r689652805



##########
File path: lucene/demo/src/java/org/apache/lucene/demo/SearchFiles.java
##########
@@ -242,4 +266,54 @@ public static void doPagingSearch(
       }
     }
   }
+
+  private static Query addSemanticQuery(Query query, KnnVectorDict vectorDict) throws IOException {
+    StringBuilder semanticQueryText = new StringBuilder();
+    QueryFieldTermExtractor termExtractor = new QueryFieldTermExtractor("contents");
+    query.visit(termExtractor);
+    for (String term : termExtractor.terms) {
+      semanticQueryText.append(term).append(' ');

Review comment:
       I found it surprising that this demo incorporates an analyzer -- I was expecting users to map directly from the query string or query terms to a vector. I'm guessing many users will be using sentence embedding models, which take a whole (untokenized) sentence as input and output a vector.
   
   In this demo, would it make sense to simplify and have a method like "createEmbeddingVector" that maps directly from a set of terms to a vector?

##########
File path: lucene/demo/src/java/org/apache/lucene/demo/SearchFiles.java
##########
@@ -242,4 +266,54 @@ public static void doPagingSearch(
       }
     }
   }
+
+  private static Query addSemanticQuery(Query query, KnnVectorDict vectorDict) throws IOException {
+    StringBuilder semanticQueryText = new StringBuilder();
+    QueryFieldTermExtractor termExtractor = new QueryFieldTermExtractor("contents");
+    query.visit(termExtractor);
+    for (String term : termExtractor.terms) {
+      semanticQueryText.append(term).append(' ');
+    }
+    if (semanticQueryText.length() > 0) {
+      KnnVectorQuery knnQuery =
+          new KnnVectorQuery(
+              "contents-vector",
+              new DemoKnnAnalyzer(vectorDict).analyze("text", semanticQueryText.toString()),
+              1);

Review comment:
       It might make sense to use a k greater than 1 here, to make it clear to users that they can retrieve any number of nearest vectors. Maybe we could even make it configurable instead of just having a flag to turn vectors on/ off.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org