You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "jpountz (via GitHub)" <gi...@apache.org> on 2023/02/10 17:04:27 UTC

[GitHub] [lucene] jpountz opened a new issue, #12142: Separate index/store document APIs take 2?

jpountz opened a new issue, #12142:
URL: https://github.com/apache/lucene/issues/12142

   ### Description
   
   As @rmuir managed to make me look into reducing the amount of guessing we're doing in our document API, I think that a requirement for doing it right will be to split our index and store document APIs. Currently, `Document` and `IndexableField` are trying to cover both, which creates a confusing API.
   
   For the store API, I wonder if we still need a Document abstraction. Maybe we could return a simple `List<StoredValue>` and push the work on users to convert it into a map if they want to be able to access fields by name. This is something that is easy to do with Java streams nowadays.
   
   For the index API, I'd like to model `IndexableField` according to how it's getting consumed by `IndexingChain`. Something like:
   
   ```java
   class IndexableField {
   
     // Populates the inverted index, IndexableValue wraps one of a BytesRef or a TokenStream
     IndexableValue invertedValue(Analyzer analyzer, TokenStream reuse);
   
     // Populates points
     byte[] pointValue();
   
     // Populates NUMERIC or SORTED_NUMERIC doc values
     long numericDocValue(); 
   
     // Populates BINARY, SORTED or SORTED_SET doc values
     BytesRef docValue();
   
     // Populates stored fields
     StoredValue storedValue();
   
     // Populates byte[] vectors
     byte[] binaryVectorValue();
   
     // Populates float[] vectors
     float[] floatVectorValue();
   
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on issue #12142: Separate index/store document APIs take 2?

Posted by "rmuir (via GitHub)" <gi...@apache.org>.
rmuir commented on issue #12142:
URL: https://github.com/apache/lucene/issues/12142#issuecomment-1426145382

   I want the additional type-safety of things like Field classes that users use, but I think at a high-level, Document/Field api is good and intuitive model for end users. 
   
   So I'm not sure about the value of introducing different apis or more separation.
   You can already avoid using Document if you don't want to use that.
   
   I don't agree with all the methods on Document today that pretend to offer map-like access but are really linear time searches through a list though, I would support deprecating all those.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on issue #12142: Separate index/store document APIs take 2?

Posted by "jpountz (via GitHub)" <gi...@apache.org>.
jpountz commented on issue #12142:
URL: https://github.com/apache/lucene/issues/12142#issuecomment-1426159441

   > I'm not sure about the value of introducing different apis or more separation.
   
   The issue in my mind is that `IndexableField` has something called `storedValue` for indexing which is never populated when retrieving stored fields, and it also has a `numericValue()` for retrieval that can be any number type, but than `IndexingChain` always treats as a long.
   
   > I don't agree with all the methods on Document today that pretend to offer map-like access but are really linear time searches through a list though, I would support deprecating all those.
   
   +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] ArtemLukanin-TomTom commented on issue #12142: Separate index/store document APIs take 2?

Posted by "ArtemLukanin-TomTom (via GitHub)" <gi...@apache.org>.
ArtemLukanin-TomTom commented on issue #12142:
URL: https://github.com/apache/lucene/issues/12142#issuecomment-1666612546

   I feel, that this discussion is close to my comment https://github.com/apache/lucene/issues/10374#issuecomment-1666612060


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on issue #12142: Separate index/store document APIs take 2?

Posted by "rmuir (via GitHub)" <gi...@apache.org>.
rmuir commented on issue #12142:
URL: https://github.com/apache/lucene/issues/12142#issuecomment-1426209181

   yeah and just to clarify at high-level: Indexing Time:
   * I think getting Document/Field api should be geared at this. Remove ability for document to be a "map" backed by a slow list. Try to add type-safety to the Field classes like we have been doing. I do think storing should be easy here (e.g. `Field.Store.YES` like people are accustomed to).
   
   retrieval time:
   * should be something different, probably map-like by "default/easy", but
   * Let's put some serious thought into naming first, before we try this again.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on issue #12142: Separate index/store document APIs take 2?

Posted by "rmuir (via GitHub)" <gi...@apache.org>.
rmuir commented on issue #12142:
URL: https://github.com/apache/lucene/issues/12142#issuecomment-1426155287

   Honestly i think a big challenge is naming. Last time we tried this, the name was `StoredDocument` but I think that name is confusing. Too bad we created a class named `StoredFields` in #11998 as I almost think that would be perfect, for what you should "get back from the index".


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on issue #12142: Separate index/store document APIs take 2?

Posted by "rmuir (via GitHub)" <gi...@apache.org>.
rmuir commented on issue #12142:
URL: https://github.com/apache/lucene/issues/12142#issuecomment-1426151437

   As far as what gets constructed by the "default" / "easy" stored fields visitor, i do think that one should be a Map and not conflated with Document used for indexing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org