You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Hoss Man (JIRA)" <ji...@apache.org> on 2018/03/15 02:22:00 UTC
[jira] [Commented] (SOLR-11891) BinaryResponseWriter fetches unnecessary fields

    [ https://issues.apache.org/jira/browse/SOLR-11891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16399802#comment-16399802 ] 

Hoss Man commented on SOLR-11891:
---------------------------------



bq. FYI I attached the diff we made to DocsStreamer.  

Thanks wei -- unfortunately your patch breaks compilation of some other classes (on master) but also suffers from an NPE in the case where globs are used (ex: {{fl=\*}}

I started down the road of a more "optimized" patch with what i suggested above...

bq. I think the ideal "fix" is that the SolrReturnFields.getLuceneFieldNames() should get passed down all the way into convertLuceneDocToSolrDoc (or something we refactor it into) such that we do an runtime check of which list is smaller: SolrReturnFields.getLuceneFieldNames() or Document.getFields() – and then loop over that (smallest) list.

...and i've currently got a patch which implements this along with a whitebox test to assert that the "optimization" is being used -- but while working on it i realized this isn't actually an optimization...

{code}
for (String fname : returnFieldNames) {
  for (IndexableField f : doc.getFields(fname)) {
    // do stuff
  }
}
{code}
The problem is that {{Document}} isn't a Map -- it doesn't have efficient lookup of the values associated with a fieldname.  In order to do the {{fieldname=>value[]}} lookup of {{doc.getFields(fname)}}, it has to do an iterative scan all of the internal {{IndexableField}} (it can't even short circut out when it finds one because there could be multiples with the same name, and there's no garuntee they are in a predictible order)

So with this "optimization" we're actually introducing *more* loops over all the {{IndexableField}} instances.

The key reason wei was probably aple to see an improvement with hte change mentioned, is because at least when {{convertDocumentToSolrDocument}} is done, the final {{SolrDocumnet}} is as small as possible, so the *subsequent* scans in the ResponseWriter are faster.

We should be able to accomplish the same speed up "safely" by ensuring that when we do loop over the {{IndexableField}} instances, we check {{ReturnField.wantsField(fname)}}

I'll work on a revised (and much simpler) patch tomorow.




> BinaryResponseWriter fetches unnecessary fields
> -----------------------------------------------
>
>                 Key: SOLR-11891
>                 URL: https://issues.apache.org/jira/browse/SOLR-11891
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Response Writers
>    Affects Versions: 5.4, 6.4.2, 6.6.2
>            Reporter: wei wang
>            Priority: Major
>         Attachments: DocsStreamer.java.diff, SOLR-11891.patch.BAD
>
>
> We observe that solr query time increases significantly with the number of rows requested,  even all we retrieve for each document is just fl=id,score.  Debugged a bit and see that most of the increased time was spent in BinaryResponseWriter,  converting lucene document into SolrDocument.  Inside convertLuceneDocToSolrDoc():   
> [https://github.com/apache/lucene-solr/blob/df874432b9a17b547acb24a01d3491839e6a6b69/solr/core/src/java/org/apache/solr/response/DocsStreamer.java#L182] 
> I am a bit puzzled why we need to iterate through all the fields in the document. Why can’t we just iterate through the requested field list?    
> [https://github.com/apache/lucene-solr/blob/df874432b9a17b547acb24a01d3491839e6a6b69/solr/core/src/java/org/apache/solr/response/DocsStreamer.java#L156] 
> e.g. when pass in the field list as 
> sdoc = convertLuceneDocToSolrDoc(doc, rctx.getSearcher().getSchema(), fnames)
> and just iterate through fnames,  there is a significant performance boost in our case.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org