You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "David Smiley (JIRA)" <ji...@apache.org> on 2017/12/18 04:40:00 UTC
[jira] [Commented] (SOLR-11769) Sorting performance degrades when useFilterForSortedQuery is enabled and there is no filter query specified

    [ https://issues.apache.org/jira/browse/SOLR-11769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294516#comment-16294516 ] 

David Smiley commented on SOLR-11769:
-------------------------------------

I think the solution is as simple as wrapping lines 1400 & 1401 in a new if block to check that getFilterList is non-null and non-empty.  I think the idea of "useFilterForSortedQuery" is still relevant when there are no filters because the query is also considered, and will be cached here (on line 1399 call to getDocSet) and thus benefit from the optimization.

> Sorting performance degrades when useFilterForSortedQuery is enabled and there is no filter query specified
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-11769
>                 URL: https://issues.apache.org/jira/browse/SOLR-11769
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: search
>    Affects Versions: 4.10.4
>         Environment: OS: macOS Sierra (version 10.12.4)
> Memory: 16GB
> CPU: 2.9 GHz Intel Core i7
> Java Version: 1.8
>            Reporter: Betim Deva
>              Labels: performance
>
> The performance of sorting degrades significantly when the {{useFilterForSortedQuery}} is enabled, and there's no filter query specified.
> *Steps to Reproduce:*
> 1. Set {{useFilterForSortedQuery=true}} in {{solrconfig.xml}}
> 2. Run a  query to match and return a single document. Also add sorting
> - Example {{/select?q=foo:123&sort=bar+desc}}
> Having a large index (> 10 million documents), this yields to a slow response (a few hundreds of milliseconds on average) even when the resulting set consists of a single document.
> *Observation 1:*
> - Disabling {{useFilterForSortedQuery}} improves the performance to < 1ms
> *Observation 2:*
> - Removing the {{sort}} improves the performance to < 1ms
> *Observation 3:*
> - Keeping the {{sort}}, and adding any filter query (such as {{fq=\*:\*}}) improves the performance to < 1 ms.
> After profiling [SolrIndexSearcher.java|https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;a=blob;f=solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java;h=9ee5199bdf7511c70f2cc616c123292c97d36b5b;hb=HEAD#l1400] found that the bottleneck is on 
> {{DocSet bigFilt = getDocSet(cmd.getFilterList());}} 
> when {{cmd.getFilterList())}} is passed in as {{null}}. This is making {{getDocSet()}} function collect document ids every single time it is called without any caching.
> {code:java}
> 1394     if (useFilterCache) {
> 1395       // now actually use the filter cache.
> 1396       // for large filters that match few documents, this may be
> 1397       // slower than simply re-executing the query.
> 1398       if (out.docSet == null) {
> 1399         out.docSet = getDocSet(cmd.getQuery(), cmd.getFilter());
> 1400         DocSet bigFilt = getDocSet(cmd.getFilterList());
> 1401         if (bigFilt != null) out.docSet = out.docSet.intersection(bigFilt);
> 1402       }
> 1403       // todo: there could be a sortDocSet that could take a list of
> 1404       // the filters instead of anding them first...
> 1405       // perhaps there should be a multi-docset-iterator
> 1406       sortDocSet(qr, cmd);
> 1407     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org