You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "Yael Mushinsky (Jira)" <ji...@apache.org> on 2021/08/18 11:46:00 UTC
[jira] [Commented] (SOLR-15586) After upgrading to Solr 8.7 from 5.2.1 seems that asking for 'score' in 'fl' causes the query to take almost twice (or so) the time (regression?)

    [ https://issues.apache.org/jira/browse/SOLR-15586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17400988#comment-17400988 ] 

Yael Mushinsky commented on SOLR-15586:
---------------------------------------

Tried to remove the code colored in green with small patch:

org.apache.solr.search.Grouping.CommandFunc:

@Override

    @SuppressWarnings(\{"unchecked"})

    protected void finish() throws IOException {

      if (secondPass != null)

{         result = secondPass.getTopGroups(0);

        {color:#00875a}*populateScoresIfNecessary();*  {color}    

}

 

I get the scores and everything seems working ... what is it's purpose?

 

> After upgrading to Solr 8.7 from 5.2.1 seems that asking for 'score' in 'fl' causes the query to take almost twice (or so) the time (regression?)
> -------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-15586
>                 URL: https://issues.apache.org/jira/browse/SOLR-15586
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 8.7
>            Reporter: Yael Mushinsky
>            Priority: Major
>
>  In performance tests conducted on the same system, comparing its performance with Solr 5.2.1 verses 8.7, was found a degradation in the query execution time.
> The system uses a custom query parser, which constructs quite a large query object.
> When removing the ‘score’ parameter from the ‘fl’ list, the times were back as in Solr 5.
> Trying to trouble shoot using jstack (small bash script that take a jstack of the Solr thread every few milliseconds) found that the Solr 8 thread “spends” time in the stack: (was appearing in several jstack snippets) "qtp2038105753-19" #19 prio=5 os_prio=0 cpu=1837.18ms elapsed=10334.56s tid=0x00007f7da0973000 nid=0x6ddd runnable  [0x00007f7d6d5f2000]
>    java.lang.Thread.State: RUNNABLE
>         at sun.nio.ch.FileDispatcherImpl.pread0([java.base@13.0.2/Native|mailto:java.base@13.0.2/Native%3cmailto:java.base@13.0.2/Native] Method)
>         at sun.nio.ch.FileDispatcherImpl.pread([java.base@13.0.2/FileDispatcherImpl.java:54|mailto:java.base@13.0.2/FileDispatcherImpl.java:54%3cmailto:java.base@13.0.2/FileDispatcherImpl.java:54])
>         at sun.nio.ch.IOUtil.readIntoNativeBuffer([java.base@13.0.2/IOUtil.java:274|mailto:java.base@13.0.2/IOUtil.java:274%3cmailto:java.base@13.0.2/IOUtil.java:274])
>         at sun.nio.ch.IOUtil.read([java.base@13.0.2/IOUtil.java:245|mailto:java.base@13.0.2/IOUtil.java:245%3cmailto:java.base@13.0.2/IOUtil.java:245])
>         at sun.nio.ch.FileChannelImpl.readInternal([java.base@13.0.2/FileChannelImpl.java:811|mailto:java.base@13.0.2/FileChannelImpl.java:811%3cmailto:java.base@13.0.2/FileChannelImpl.java:811])
>         at sun.nio.ch.FileChannelImpl.read([java.base@13.0.2/FileChannelImpl.java:796|mailto:java.base@13.0.2/FileChannelImpl.java:796%3cmailto:java.base@13.0.2/FileChannelImpl.java:796])
>         at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:170)
>         at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:315)
>         at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:254)
>         at org.apache.lucene.util.fst.ReverseRandomAccessReader.readByte(ReverseRandomAccessReader.java:33)
>         at org.apache.lucene.util.fst.FST.readLabel(FST.java:596)
>         at org.apache.lucene.util.fst.FST.readArc(FST.java:1259)
>         at org.apache.lucene.util.fst.FST.readNextRealArc(FST.java:1247)
>         at org.apache.lucene.util.fst.FST.readFirstRealTargetArc(FST.java:1091)
>         at org.apache.lucene.util.fst.FST.findTargetArc(FST.java:1396)
>         at org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekExact(SegmentTermsEnum.java:485)
>         at org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekExact(FilterLeafReader.java:184)
>         at org.apache.lucene.index.TermStates.loadTermsEnum(TermStates.java:124)
>         at org.apache.lucene.index.TermStates.build(TermStates.java:109)
>         at org.apache.lucene.search.PhraseQuery$1.getStats(PhraseQuery.java:447)
>         at org.apache.lucene.search.PhraseWeight.<init>(PhraseWeight.java:38)
>         at org.apache.lucene.search.PhraseQuery$1.<init>(PhraseQuery.java:429)
>         at org.apache.lucene.search.PhraseQuery.createWeight(PhraseQuery.java:429)
>         at org.apache.lucene.search.BoostQuery.createWeight(BoostQuery.java:125)
>         at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:726)
>         at org.apache.lucene.search.BooleanWeight.<init>(BooleanWeight.java:63)
>         at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:231)
>         at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:726)
>         at org.apache.lucene.search.TopFieldCollector.populateScores(TopFieldCollector.java:563)
>         at org.apache.solr.search.Grouping$Command.populateScoresIfNecessary(Grouping.java:593)
>         at org.apache.solr.search.Grouping$CommandFunc.finish(Grouping.java:1007)
>         at org.apache.solr.search.Grouping.execute(Grouping.java:408)
>         at org.apache.solr.handler.component.QueryComponent.doProcessGroupedSearch(QueryComponent.java:1486)
>         at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:395)
>         at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:360)
>  
> This doesn’t appear in any jstack snippet taken of the Solr 5 thread, when ‘score’ is also asked for.
> Trying to look a bit into the code, I see that in solr 8 the calculation of score “if needed” is performed as a separate phase, after the document collection flow (might be that its like collecting the documents again in a way? I/O etc. ) :
>  
> org.apache.solr.search.Grouping.execute():
>    if (!collectors.isEmpty()) {
>       Collector secondPhaseCollectors = MultiCollector.wrap(collectors.toArray(new Collector[collectors.size()]));
>       if (collectors.size() > 0) {
>         if (cachedCollector != null) {
>           if (cachedCollector.isCached()) {
>             cachedCollector.replay(secondPhaseCollectors);
>           } else {
>             signalCacheWarning = true;
>             log.warn(String.format(Locale.ROOT, "The grouping cache is active, but not used because it exceeded the max cache limit of %d percent", maxDocsPercentageToCache));
>             log.warn("Please increase cache size or disable group caching.");
>             searchWithTimeLimiter(luceneFilter, secondPhaseCollectors);
>           }
>         } else {
>           if (pf.postFilter != null) {
>             pf.postFilter.setLastDelegate(secondPhaseCollectors);
>             secondPhaseCollectors = pf.postFilter;
>           }
>           searchWithTimeLimiter(luceneFilter, secondPhaseCollectors); //->seems to me this is the main document collection flow
>         }
>         if (secondPhaseCollectors instanceof DelegatingCollector) {
>           ((DelegatingCollector) secondPhaseCollectors).finish();
>         }
>       }
>     }
>  
>     for (@SuppressWarnings(\{"rawtypes"})Command cmd : commands) {
>       cmd.finish();
>     }
>  
> org.apache.solr.search.Grouping.CommandFunc:
> @Override
>     @SuppressWarnings(\{"unchecked"})
>     protected void finish() throws IOException {
>       if (secondPass != null) {
>         result = secondPass.getTopGroups(0);
>         populateScoresIfNecessary();
>       }
>  
>  
> org.apache.solr.search.Grouping.Command<T>:
> protected void populateScoresIfNecessary() throws IOException {
>       if (needScores) {
>         for (GroupDocs<?> groups : result.groups) {
>           TopFieldCollector.populateScores(groups.scoreDocs, searcher, query);
>         }
>       }
>     }
>  
>  
> I know that asking for the score has a cost of performance, but in Solr 5 it didn’t seem to be meaningful and in Solr 8 it seems to be very expansive.
> (In means of code, it seems the code is Solr 5 is different, I didn’t get into finding exactly how and where it calculates the score to add to the document upon request, but in means of timing, it doesn’t seem to have such an effect.)
> Tried asking about this over the users mailing list, but didn't get an answer ...
> Can you please assist in understanding the change in performance?
> Thank you.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org