You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/09/29 20:12:25 UTC

[GitHub] [lucene] shubhamvishu opened a new pull request, #11832: Added static factory method for loading VectorValues

shubhamvishu opened a new pull request, #11832:
URL: https://github.com/apache/lucene/pull/11832

   I have added a static factory method `getVectorValues` in `VectorValues` which returns the EMPTY `VectorValues` instance if the field is not found in the segment or if its not a vector field it throws an IAE else returns the  `VectorValues` instance. Looking forward to the feedback.
   <!--
   If this is your first contribution to Lucene, please make sure you have reviewed the contribution guide.
   https://github.com/apache/lucene/blob/main/CONTRIBUTING.md
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on pull request #11832: Added static factory method for loading VectorValues

Posted by GitBox <gi...@apache.org>.
jpountz commented on PR #11832:
URL: https://github.com/apache/lucene/pull/11832#issuecomment-1282020859

   @shubhamvishu I'm sorry that your work didn't result in a merged commit, but it feels like it would be better not to merge this change. Thank you again for looking into this!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] shubhamvishu commented on a diff in pull request #11832: Added static factory method for loading VectorValues

Posted by GitBox <gi...@apache.org>.
shubhamvishu commented on code in PR #11832:
URL: https://github.com/apache/lucene/pull/11832#discussion_r984717340


##########
lucene/core/src/java/org/apache/lucene/index/VectorValues.java:
##########
@@ -35,6 +35,25 @@ public abstract class VectorValues extends DocIdSetIterator {
   /** Sole constructor */
   protected VectorValues() {}
 
+  /**
+   * Returns the {@link VectorValues} index for this field, or {@link #EMPTY} if it has none.
+   *
+   * @param reader Leaf reader instance
+   * @param field Field name
+   * @return VectorValues instance, or an empty instance if {@code field} does not exist in this
+   *     reader
+   * @throws IOException if the field does not have any vector values
+   */
+  public static VectorValues getVectorValues(LeafReader reader, String field) throws IOException {
+    VectorValues values = reader.getVectorValues(field);
+    if (values == null) {
+      return EMPTY;
+    } else if (!reader.getFieldInfos().fieldInfo(field).hasVectorValues()) {

Review Comment:
   Got it....Thanks @rmuir 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller commented on a diff in pull request #11832: Added static factory method for loading VectorValues

Posted by GitBox <gi...@apache.org>.
gsmiller commented on code in PR #11832:
URL: https://github.com/apache/lucene/pull/11832#discussion_r991658811


##########
lucene/core/src/java/org/apache/lucene/index/VectorValues.java:
##########
@@ -35,6 +35,27 @@ public abstract class VectorValues extends DocIdSetIterator {
   /** Sole constructor */
   protected VectorValues() {}
 
+  /**
+   * Returns the {@link VectorValues} instance for this field, or {@link #EMPTY} if it has none.
+   *
+   * @param reader Leaf reader instance
+   * @param field Field name
+   * @return VectorValues instance, or an empty instance if {@code field} does not exist in this
+   *     reader
+   * @throws IOException if the field exists but does not have any vector values
+   */
+  public static VectorValues getVectorValues(LeafReader reader, String field) throws IOException {
+    VectorValues values = reader.getVectorValues(field);
+    if (values == null) {
+      FieldInfo fieldInfo = reader.getFieldInfos().fieldInfo(field);
+      if (fieldInfo != null && !fieldInfo.hasVectorValues()) {

Review Comment:
   I think I'm a little confused here. My understanding (which admittedly, is quite limited with the vector stuff) is that `reader.getVectorValues` will return `null` if the field doesn't exist, or if the field exists but no vectors are indexed for it. So shouldn't we check for both of these cases here? For example, what should happen if a user adds some NUMERIC doc value field called "foo" when indexing but then calls `VectorValues#getVectorValues(reader, "foo")`? Wouldn't throwing an IAE in that situation be consistent with our `DocValues` static factory methods (e.g., `DocValues#getNumeric` will throw an IAE if the field exists, but is of a non-numeric type).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on pull request #11832: Added static factory method for loading VectorValues

Posted by GitBox <gi...@apache.org>.
jpountz commented on PR #11832:
URL: https://github.com/apache/lucene/pull/11832#issuecomment-1269629285

   > One reasoning is we can have this to be consistent like how we have similar static functions in DocValues. There no code using it right now but this could be used maybe in future?
   
   I'm starting to have more second thoughts about whether we should actually add this new method. On doc values, I'm seeing value because it helps save a few annoying null checks and then everything should work as expected on the empty instances. Saving null checks applies for vectors too, but the `EMPTY` vectors instance also returns `0` for the `dimension()` method, which is something I could see some code paths to complain about if they end up seeing different values for the dimension across different segments of the same index. So it's not clear to me that it would prevent more bugs than it would introduce.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] shubhamvishu commented on a diff in pull request #11832: Added static factory method for loading VectorValues

Posted by GitBox <gi...@apache.org>.
shubhamvishu commented on code in PR #11832:
URL: https://github.com/apache/lucene/pull/11832#discussion_r988712945


##########
lucene/core/src/java/org/apache/lucene/index/SlowCodecReaderWrapper.java:
##########
@@ -163,7 +163,7 @@ private static KnnVectorsReader readerToVectorReader(LeafReader reader) {
     return new KnnVectorsReader() {
       @Override
       public VectorValues getVectorValues(String field) throws IOException {
-        return reader.getVectorValues(field);
+        return VectorValues.getVectorValues(reader, field);

Review Comment:
   > The new method looks good, but it's no longer used anywhere under src/java anymore, which makes me wonder if it's actually helpful.
   
   One reasoning is we can have this to be consistent like how we have similar static functions in `DocValues`. There no code using it right now but this could be used maybe in future?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller commented on a diff in pull request #11832: Added static factory method for loading VectorValues

Posted by GitBox <gi...@apache.org>.
gsmiller commented on code in PR #11832:
URL: https://github.com/apache/lucene/pull/11832#discussion_r991658811


##########
lucene/core/src/java/org/apache/lucene/index/VectorValues.java:
##########
@@ -35,6 +35,27 @@ public abstract class VectorValues extends DocIdSetIterator {
   /** Sole constructor */
   protected VectorValues() {}
 
+  /**
+   * Returns the {@link VectorValues} instance for this field, or {@link #EMPTY} if it has none.
+   *
+   * @param reader Leaf reader instance
+   * @param field Field name
+   * @return VectorValues instance, or an empty instance if {@code field} does not exist in this
+   *     reader
+   * @throws IOException if the field exists but does not have any vector values
+   */
+  public static VectorValues getVectorValues(LeafReader reader, String field) throws IOException {
+    VectorValues values = reader.getVectorValues(field);
+    if (values == null) {
+      FieldInfo fieldInfo = reader.getFieldInfos().fieldInfo(field);
+      if (fieldInfo != null && !fieldInfo.hasVectorValues()) {

Review Comment:
   I think I'm a little confused here. My understanding (which admittedly, is quite limited with the vector stuff) is that `reader.getVectorValues` will return `null` if the field doesn't exist, or if the field exists but no vectors are indexed for it. So shouldn't we check for both of these cases here? For example, what should happen if a user adds some NUMERIC doc value field called "foo" when indexing but then calls `VectorValues#getVectorValues(reader, "foo")`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz closed pull request #11832: Added static factory method for loading VectorValues

Posted by GitBox <gi...@apache.org>.
jpountz closed pull request #11832: Added static factory method for loading VectorValues
URL: https://github.com/apache/lucene/pull/11832


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] shubhamvishu commented on a diff in pull request #11832: Added static factory method for loading VectorValues

Posted by GitBox <gi...@apache.org>.
shubhamvishu commented on code in PR #11832:
URL: https://github.com/apache/lucene/pull/11832#discussion_r984718205


##########
lucene/core/src/java/org/apache/lucene/index/CheckIndex.java:
##########
@@ -2585,7 +2585,7 @@ public static Status.VectorValuesStatus testVectors(
                       + "\" has vector values but dimension is "
                       + dimension);
             }
-            VectorValues values = reader.getVectorValues(fieldInfo.name);
+            VectorValues values = VectorValues.getVectorValues(reader, fieldInfo.name);
             if (values == null) {

Review Comment:
   Yes....We can remove it. Thanks @gsmiller !



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller commented on a diff in pull request #11832: Added static factory method for loading VectorValues

Posted by GitBox <gi...@apache.org>.
gsmiller commented on code in PR #11832:
URL: https://github.com/apache/lucene/pull/11832#discussion_r984025608


##########
lucene/core/src/java/org/apache/lucene/index/CheckIndex.java:
##########
@@ -2585,7 +2585,7 @@ public static Status.VectorValuesStatus testVectors(
                       + "\" has vector values but dimension is "
                       + dimension);
             }
-            VectorValues values = reader.getVectorValues(fieldInfo.name);
+            VectorValues values = VectorValues.getVectorValues(reader, fieldInfo.name);
             if (values == null) {

Review Comment:
   We shouldn't need to check `values == null` here anymore right since `VectorValues#getVectorValues` never returns null?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller commented on a diff in pull request #11832: Added static factory method for loading VectorValues

Posted by GitBox <gi...@apache.org>.
gsmiller commented on code in PR #11832:
URL: https://github.com/apache/lucene/pull/11832#discussion_r991658811


##########
lucene/core/src/java/org/apache/lucene/index/VectorValues.java:
##########
@@ -35,6 +35,27 @@ public abstract class VectorValues extends DocIdSetIterator {
   /** Sole constructor */
   protected VectorValues() {}
 
+  /**
+   * Returns the {@link VectorValues} instance for this field, or {@link #EMPTY} if it has none.
+   *
+   * @param reader Leaf reader instance
+   * @param field Field name
+   * @return VectorValues instance, or an empty instance if {@code field} does not exist in this
+   *     reader
+   * @throws IOException if the field exists but does not have any vector values
+   */
+  public static VectorValues getVectorValues(LeafReader reader, String field) throws IOException {
+    VectorValues values = reader.getVectorValues(field);
+    if (values == null) {
+      FieldInfo fieldInfo = reader.getFieldInfos().fieldInfo(field);
+      if (fieldInfo != null && !fieldInfo.hasVectorValues()) {

Review Comment:
   I think I'm a little confused here. My understanding (which admittedly, is quite limited with the vector stuff) is that `reader.getVectorValues` will return `null` if the field doesn't exist, or if the field exists but no vectors are indexed for it. So shouldn't we check for both of these cases here? For example, what should happen if a user adds some NUMERIC doc value field called "foo" when indexing but then calls `reader.getVectorValues("foo")` on the index?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller commented on pull request #11832: Added static factory method for loading VectorValues

Posted by GitBox <gi...@apache.org>.
gsmiller commented on PR #11832:
URL: https://github.com/apache/lucene/pull/11832#issuecomment-1273864084

   > but the EMPTY vectors instance also returns 0 for the dimension() method, which is something I could see some code paths to complain about if they end up seeing different values for the dimension across different segments
   
   @jpountz +1. That's an interesting concern we don't have with doc values, and I could also see it causing issues/confusion for users. If we had internal usages for this (outside of tests), maybe we'd have more intuition around what makes more sense for users, but right now, we wouldn't really be a user of this functionality internally (as you point out), so it's really hard for me to know what would be most helpful to users here.
   
   I also took a closer look at a couple places where we're reading `VectorValues` as a user, in our application built on Lucene. In general, we're expecting the vectors to be of a consistent dimensionality, so we could stop checking for `null` with this change, but—as you point out—we'd need to start checking dimensionality, and that's probably more confusing.
   
   So, all things considered, without knowing more broadly what would be useful to our users, and having limited data points and intuition that returning this EMPTY VectorValues may be more confusing, I think it would make sense to hold off on adding this for now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] shubhamvishu commented on pull request #11832: Added static factory method for loading VectorValues

Posted by GitBox <gi...@apache.org>.
shubhamvishu commented on PR #11832:
URL: https://github.com/apache/lucene/pull/11832#issuecomment-1283725512

   No problem at all. I get it it makes sense to not do this right now. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on a diff in pull request #11832: Added static factory method for loading VectorValues

Posted by GitBox <gi...@apache.org>.
rmuir commented on code in PR #11832:
URL: https://github.com/apache/lucene/pull/11832#discussion_r983993590


##########
lucene/core/src/java/org/apache/lucene/index/VectorValues.java:
##########
@@ -35,6 +35,25 @@ public abstract class VectorValues extends DocIdSetIterator {
   /** Sole constructor */
   protected VectorValues() {}
 
+  /**
+   * Returns the {@link VectorValues} index for this field, or {@link #EMPTY} if it has none.
+   *
+   * @param reader Leaf reader instance
+   * @param field Field name
+   * @return VectorValues instance, or an empty instance if {@code field} does not exist in this
+   *     reader
+   * @throws IOException if the field does not have any vector values
+   */
+  public static VectorValues getVectorValues(LeafReader reader, String field) throws IOException {
+    VectorValues values = reader.getVectorValues(field);
+    if (values == null) {
+      return EMPTY;
+    } else if (!reader.getFieldInfos().fieldInfo(field).hasVectorValues()) {

Review Comment:
   this line of code is unreachable. It is actually the null case where you want to check if the field exists, and if so, does it have vector values. See DocValues as an example.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org