You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2017/03/11 15:59:04 UTC

[jira] [Resolved] (LUCENE-6187) explore symmetic docvalues pull API

     [ https://issues.apache.org/jira/browse/LUCENE-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-6187.
----------------------------------------
       Resolution: Fixed
    Fix Version/s: master (7.0)

This was done with the switch to an iterator API for doc values for 7.0.

> explore symmetic docvalues pull API
> -----------------------------------
>
>                 Key: LUCENE-6187
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6187
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Robert Muir
>             Fix For: master (7.0)
>
>
> Currently the DocValuesConsumer and NormsConsumer have a streaming pull API based on Iterable.
> {code}
> addNumericField(FieldInfo field, Iterable<Number> values)
> ...
> addSortedSetField(FieldInfo field, Iterable<BytesRef> values, Iterable<Number> docToOrdCount, Iterable<Number> ords)
> {code}
> I think this was a good initial approach, but it has a few downsides:
> * for more complex structures (sorted/sortedset/sortednumeric) the codec must awkwardly handle multiple streams and sometimes inefficiently do extra passes.
> * thousands of lines of XXXDocValues <-> Iterable bridge handling in merge code (when MultiDocValues already knows how to merge multiple subs)
> * missing values represented as null is awkward, complicated and a little trappy on the consumer.
> I think we should explore changing it to look more like postings:
> {code}
> addNumericField(FieldInfo field, NumericDocValues values, Bits docsWithField)
> addSortedSetField(FieldInfo field, SortedSetDocValues values, Bits docsWithField)
> {code}
> I don't think it would be hard on the implementation: e.g. when I look at IndexWriter it seems like these would even be simpler code than the current iterators (e.g. for numerics its already got a NumericDocValues and a Bits docsWithField, the current iterable stuff is just "extra" bridge code like merging).
> My main concern is if it makes things easier on the codec impls or not. I think we have to try it out to see. We could test it out on trunk with just NormsConsumer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org