You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Erick Erickson (JIRA)" <ji...@apache.org> on 2015/12/01 18:04:11 UTC

[jira] [Comment Edited] (SOLR-8220) Read field from docValues for non stored fields

    [ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034013#comment-15034013 ] 

Erick Erickson edited comment on SOLR-8220 at 12/1/15 5:04 PM:
---------------------------------------------------------------

Deleting, see discussion at SOLR-8344


was (Author: erickerickson):
Random thought that occurred to me before coffee, so be warned.

The initial statement here is 'Many times a value will be both stored="true" and docValues="true" ', then there was a lot of discussion about efficiencies etc... 

Why are we trying to anticipate "the right thing to do"? It would be simpler to code something like:
> If the field is stored=true, return the stored value (don't even need to look whether DV is true or not).
> If the field is stored=false and docValues=true, return the DV value.

Now it's totally under the control of the user which path is chosen through the schema definition; we don't have to try to guess anything. No new syntax. Maybe with a new "best practice" or something. There would be a learning curve for users around using only docValues=true for efficiency and _not_ setting stored=true. Not quite sure what to do if the user defines both however, perhaps use the stored value?

The thing one does lose is the ability to get 2 and 1.999999999999 from the _same_ field, so there would be the added burden on the user of having to have two fields, one stored-only one dv-only if that distinction was important.

And in the "wild and crazy" department (and for a different ticket _entirely_) we could consider disallowing fields with both docValues=true and stored=true. Not advocating this last, just throwing out for discussion.

Let me emphasize that I don't have any investment in doing things this way, and apologies for thinking of this so late in the game.

> Read field from docValues for non stored fields
> -----------------------------------------------
>
>                 Key: SOLR-8220
>                 URL: https://issues.apache.org/jira/browse/SOLR-8220
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Keith Laban
>         Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which requires redundant data to be stored on disk. Since reading from docValues is both efficient and a common practice (facets, analytics, streaming, etc), reading values from docValues when a stored version of the field does not exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as they would always be returned sorted in the docValues approach. I believe this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think it should live closer to where stored fields are loaded in the SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, facets, analytics, streaming, etc, all seem to be doing their own ways, perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>    -- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org