You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2014/03/21 00:41:48 UTC

[jira] [Commented] (LUCENE-5542) Explore making DVConsumer sparse-aware

    [ https://issues.apache.org/jira/browse/LUCENE-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942530#comment-13942530 ] 

Robert Muir commented on LUCENE-5542:
-------------------------------------

The codec can already decide how to encode the values. Making the API more complicated doesn't seem to buy us anything. I'm open to a benchmark showing this, but I'm not seeing it.

> Explore making DVConsumer sparse-aware
> --------------------------------------
>
>                 Key: LUCENE-5542
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5542
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs
>            Reporter: Shai Erera
>
> Today DVConsumer API requires the caller to pass a value for every document, where {{null}} means "this doc has no value". The Codec can then choose how to encode the values, i.e. whether it encodes a 0 for a numeric field, or encodes the sparse docs. In practice, from what I see, we choose to encode the 0s.
> I wonder if we e.g. added an {{Iterable<Number>}} to DVConsumer.addXYZField(), if that would make a better API. The caller only passes <doc,value> pairs and it's up to the Codec to decide how it wants to encode the missing values. Like, if a user's app truly has a sparse NDV, IndexWriter doesn't need to "fill the gaps" artificially. It's the job of the Codec.
> To be clear, I don't propose to change any Codec implementation in this issue (w.r.t. sparse encoding - yes/no), only change the API to reflect that sparseness. I think that if we'll ever want to encode sparse values, it will be a more convenient API.
> Thoughts? I volunteer to do this work, but want to get others' opinion before I start.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org