You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Keith Laban (JIRA)" <ji...@apache.org> on 2015/11/16 05:17:11 UTC
[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields

     [ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Keith Laban updated SOLR-8220:
------------------------------
    Attachment: SOLR-8220.patch

adding an initial attempt at this patch. What does everyone think about taking an approach like this?

This patch will decorate a document with docValue values after the stored values have been read. For now it skips multivalued fields and only reads from docValues if FL is specified and the field is not stored.

Some problems I noticed with this approach are:

1) LazyDocument doesn't support a notion of loading from docValues, without
mucking around in there I can't see a way apply the docValues before caching
because of various FLs.

2) There is no metadata (that I can find) stored for each document that says
whether it has an unstored docValue field, so efficiently loading docValues
fields based on FL=* would be difficult. The only possibly way right now is to
iterate over all schema fields looking for the viable docValue fields for each
matched document.

3) More of a question: What kind of FieldType should be created when adding
these docValues to the document?


I have an alternate patch that attempts to preload stored values from docValues
before handing over to the IndexReader.  Those fields are then skipped later
on.  It passes tests, but it's not a very elegent approach. And also has
its limitations; it only works when FL is specified and it wouldn't
work on the cached hit of the document if a field is loaded with LazyDocument.


> Read field from docValues for non stored fields
> -----------------------------------------------
>
>                 Key: SOLR-8220
>                 URL: https://issues.apache.org/jira/browse/SOLR-8220
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Keith Laban
>         Attachments: SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which requires redundant data to be stored on disk. Since reading from docValues is both efficient and a common practice (facets, analytics, streaming, etc), reading values from docValues when a stored version of the field does not exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as they would always be returned sorted in the docValues approach. I believe this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think it should live closer to where stored fields are loaded in the SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, facets, analytics, streaming, etc, all seem to be doing their own ways, perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>    -- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org