You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Yonik Seeley (JIRA)" <ji...@apache.org> on 2006/10/07 18:01:20 UTC
[jira] Commented: (SOLR-52) Lazy Field loading

    [ http://issues.apache.org/jira/browse/SOLR-52?page=comments#action_12440682 ] 
            
Yonik Seeley commented on SOLR-52:
----------------------------------

+1, looks good.

There are some small backward incompatabilities (any place that returns a Fieldable, like getUniqueKeyField), but it can't be helped, and it's fairly expert level anyway.

My only concern was about a memory increase for lazy-loaded short fields.  I reviewed some of the LazyField code just now, and it looks like this shouldn't be the case:
 - LazyField is an inner class that contains an extra 3 members.   It's outer class that it will retain a reference to is FieldsReader.    The fieldsReader instance is a member of SegmentReader, and has the same lifetime as the SegmentReader.  Hence the LazyField won't extend the lifetime of any other objects.

One thing I did see is the internal char[] buffer used to read the string in LazyField is a member for some reason (hence the data will be stored in the field *twice* for some reason).  I think this is probably a bug, and I'll bring it up on the Lucene list.

Ideas for future optimizations:
- if there is no document cache, change lazy to no-load
- special cases: if only a single field (like the ID field) is selected out of many documents to be return, consider bypassing doc cache and use LOAD_AND_BREAK if we know there is only a single value.

> Lazy Field loading
> ------------------
>
>                 Key: SOLR-52
>                 URL: http://issues.apache.org/jira/browse/SOLR-52
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Mike Klaas
>         Assigned To: Mike Klaas
>            Priority: Minor
>         Attachments: lazyfields_patch.diff
>
>
> Add lazy field loading to solr.
> Currently solr reads all stored fields and filters the undesired fields based on the field list.  This is usually not a performance concern, but when using solr to store large numbers of fields, or just one large field (doc contents, eg. for highlighting), it is perceptible.
> Now, there is a concern with the doc cache of SolrIndexSearcher, which assumes it has the whole document in the cache.  To maintain this invariant, it is still the case that all the fields in a document are loaded in a searcher.doc(i) call.  However, if a field set is given to teh method, only the given fields are loaded directly, while the rest are loaded lazily.
> Some concerns about lazy field loading
>   1. Lazy field are only valid while the IndexReader is open.  I believe this is fine since the IndexReader is kept alive by the SolrIndexSearcher, so all docs in the cache have the reader available.  
>   2. It is slower to read a field lazily and retrieve its value later than retrieve it directory to begin with (though I don't know how much--depends on i/o factors).  We certainly don't want this to be the common case.  I added an optional call which accumulates all the field likely to be used in the request (highlighting, reponse writing), and populates the IndexSearcher cache a priori.  This has the added advantage of concentrating doc retrieval in a single place, which is nice from a performance testing perspective.
>  3. LazyFields are incompatible with the sundry Field declarations scattered about Solr.  I believe I've changed all the necessary locations to Fieldable.
> Comments appreciated

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira