You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Wei <we...@gmail.com> on 2018/01/20 00:44:27 UTC

BinaryResponseWriter fetches unnecessary fields?

Hi all,


We observe that solr query time increases significantly with the number of
rows requested,  even all we retrieve for each document is just
fl=id,score.  Debugged a bit and see that most of the increased time was
spent in BinaryResponseWriter,  converting lucene document into
SolrDocument.


Inside convertLuceneDocToSolrDoc():


https://github.com/apache/lucene-solr/blob/df874432b9a17b547acb24a01d3491
839e6a6b69/solr/core/src/java/org/apache/solr/response/
DocsStreamer.java#L182


   for (IndexableField f : doc.getFields())


I am a bit puzzled why we need to iterate through all the fields in the
document. Why can’t we just iterate through the requested fields in fl?
Specifically:



https://github.com/apache/lucene-solr/blob/df874432b9a17b547acb24a01d3491
839e6a6b69/solr/core/src/java/org/apache/solr/response/
DocsStreamer.java#L156


if we change  sdoc = convertLuceneDocToSolrDoc(doc,
rctx.getSearcher().getSchema())  to


        sdoc = convertLuceneDocToSolrDoc(doc, rctx.getSearcher().getSchema(),
fnames)


and just iterate through fnames in convertLuceneDocToSolrDoc(),  there is a
significant performance boost in our case, the query time increase from
rows=128 vs rows=500 is much smaller.  Am I missing something here?


Thanks,

Wei

Re: BinaryResponseWriter fetches unnecessary fields?

Posted by Chris Hostetter <ho...@fucit.org>.

: Thanks Chris! Is RetrieveFieldsOptimizer a new functionality introduced in
: 7.x?  Our observation is with botht 5.4 & 6.4.  I have created a jira for
: the issue:

The same basic code path (related to stored fields) probably existed 
largely as is in 5.x and 6.x and was then later refactored into  
RetrieveFieldsOptimizer where it knows about things like the 
useDocValuesAsStored option/optimization.

-Hoss
http://www.lucidworks.com/

Re: BinaryResponseWriter fetches unnecessary fields?

Posted by Wei <we...@gmail.com>.

Thanks Chris! Is RetrieveFieldsOptimizer a new functionality introduced in
7.x?  Our observation is with botht 5.4 & 6.4.  I have created a jira for
the issue:

https://issues.apache.org/jira/browse/SOLR-11891

I am also wondering how enableLazyFieldLoading affect the case, but haven't
tested yet. Please let us know if you catch anything.


Thanks,
Wei


On Mon, Jan 22, 2018 at 3:15 PM, Chris Hostetter <ho...@fucit.org>
wrote:

>
> : Inside convertLuceneDocToSolrDoc():
> :
> :
> : https://github.com/apache/lucene-solr/blob/
> df874432b9a17b547acb24a01d3491
> : 839e6a6b69/solr/core/src/java/org/apache/solr/response/
> : DocsStreamer.java#L182
> :
> :
> :    for (IndexableField f : doc.getFields())
> :
> :
> : I am a bit puzzled why we need to iterate through all the fields in the
> : document. Why can’t we just iterate through the requested fields in fl?
> : Specifically:
>
> I have a hunch here -- but i haven't verified it.
>
> First of all: the specific code in question that you mention assumes it
> doesn't *need* to filter out the result of "doc.getFields()" basd on the
> 'fl' because at the point in the processing where the DocsStreamer is
> looping over the result of "doc.getFields()" the "Document" object it's
> dealing with *should* only contain the specific (subset of stored) fields
> requested by the fl param -- this is handled by RetrieveFieldsOptimizer &
> SolrDocumentFetcher that the DocsStreamer builds up acording to the
> results of ResultContext.getReturnFields() when asking the
> SolrIndexSearcher to fetch the doc()
>
> But i think what's happening here is that because of the documentCache,
> there are cases where the SolrIndexSearcher is not actaully using
> a SolrDocumentStoredFieldVisitor to limit what's requested from the
> IndexReader, and the resulting Document contains all fields -- which is
> then compounded by code that loops over every field.
>
> At a quick glance, I'm a little fuzzy on how exactly
> enableLazyFieldLoading may/may-not be affecting things here, but either
> way I think you are correct -- we can/should make this overall stack of
> code smarter about looping over fields we know we want, vs looping over
> all fields in the doc.
>
> Can you please file a jira for this?
>
>
> -Hoss
> http://www.lucidworks.com/

Re: BinaryResponseWriter fetches unnecessary fields?

Posted by Chris Hostetter <ho...@fucit.org>.

: Inside convertLuceneDocToSolrDoc():
: 
: 
: https://github.com/apache/lucene-solr/blob/df874432b9a17b547acb24a01d3491
: 839e6a6b69/solr/core/src/java/org/apache/solr/response/
: DocsStreamer.java#L182
: 
: 
:    for (IndexableField f : doc.getFields())
: 
: 
: I am a bit puzzled why we need to iterate through all the fields in the
: document. Why can’t we just iterate through the requested fields in fl?
: Specifically:

I have a hunch here -- but i haven't verified it.

First of all: the specific code in question that you mention assumes it 
doesn't *need* to filter out the result of "doc.getFields()" basd on the 
'fl' because at the point in the processing where the DocsStreamer is 
looping over the result of "doc.getFields()" the "Document" object it's 
dealing with *should* only contain the specific (subset of stored) fields 
requested by the fl param -- this is handled by RetrieveFieldsOptimizer & 
SolrDocumentFetcher that the DocsStreamer builds up acording to the 
results of ResultContext.getReturnFields() when asking the 
SolrIndexSearcher to fetch the doc()

But i think what's happening here is that because of the documentCache, 
there are cases where the SolrIndexSearcher is not actaully using 
a SolrDocumentStoredFieldVisitor to limit what's requested from the 
IndexReader, and the resulting Document contains all fields -- which is 
then compounded by code that loops over every field.  

At a quick glance, I'm a little fuzzy on how exactly 
enableLazyFieldLoading may/may-not be affecting things here, but either 
way I think you are correct -- we can/should make this overall stack of 
code smarter about looping over fields we know we want, vs looping over 
all fields in the doc.

Can you please file a jira for this?


-Hoss
http://www.lucidworks.com/