You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Erick Erickson (JIRA)" <ji...@apache.org> on 2015/10/28 02:34:27 UTC
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

    [ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977537#comment-14977537 ] 

Erick Erickson commented on SOLR-8220:
--------------------------------------

The only surprising behavior I see with this approach is that indexing 2.0 would return different values when returned from docValues and when returned from stored, the DV return might be something like 1.9999999999999999 or even 2.0000000000000 would be a "surprise".

I'm not against the idea, just want this out there.

And one side benefit that's not entirely obvious. In sharded situations, the first pass returns the candidate list ID and "sort criteria". The way it's written last I knew was it returned stored values, which required decompression because it gets the stored field. If all the sort fields were DV, then we wouldn't have to do this.

This can't be the complete story since you can index but not store a sort field and distributed works, but it's one path I believe I've seen. It's an open question how to wire that in to standard search for a field that's stored, and a DV field.

> Read field from docValues for non stored fields
> -----------------------------------------------
>
>                 Key: SOLR-8220
>                 URL: https://issues.apache.org/jira/browse/SOLR-8220
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Keith Laban
>
> Many times a value will be both stored="true" and docValues="true" which requires redundant data to be stored on disk. Since reading from docValues is both efficient and a common practice (facets, analytics, streaming, etc), reading values from docValues when a stored version of the field does not exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as they would always be returned sorted in the docValues approach. I believe this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think it should live closer to where stored fields are loaded in the SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, facets, analytics, streaming, etc, all seem to be doing their own ways, perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>    -- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org