You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Yonik Seeley (JIRA)" <ji...@apache.org> on 2015/12/01 16:30:11 UTC

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

    [ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033868#comment-15033868 ] 

Yonik Seeley commented on SOLR-8220:
------------------------------------

bq. It sounds like you are arguing for a common way to access docvalues and stored fields using the 'fl' parameter. I'm +1 to that.

Ah, ok... I mis-read your previous comment of "i.e. keep this issue focused on adding syntactic sugar to read field from doc values for non-stored fields" as advocating for new syntax for "fl" to load from non-stored docValue fields.

bq. Let's discuss this optimization in SOLR-8344 and keep the two issues separate.

I didn't really see it as separate (it depends on how you look at it),  I see it more as, we have a new feature that treats docValues as "column-stored".  What should the default behavior be when all requested fields are both column-stored and row-stored? I think we can make progress + commit this issue separately, but should still come at SOLR-8344 "fresh" (i.e. not put the burden of proof on one default more than the other).


> Read field from docValues for non stored fields
> -----------------------------------------------
>
>                 Key: SOLR-8220
>                 URL: https://issues.apache.org/jira/browse/SOLR-8220
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Keith Laban
>         Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which requires redundant data to be stored on disk. Since reading from docValues is both efficient and a common practice (facets, analytics, streaming, etc), reading values from docValues when a stored version of the field does not exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as they would always be returned sorted in the docValues approach. I believe this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think it should live closer to where stored fields are loaded in the SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, facets, analytics, streaming, etc, all seem to be doing their own ways, perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>    -- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org