You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "David Smiley (Jira)" <ji...@apache.org> on 2021/03/14 05:36:00 UTC

[jira] [Commented] (SOLR-15246) A unified highlighting search under solr 8.8.0/8.8.1 can take over 20 mins to run and eventually times out.

    [ https://issues.apache.org/jira/browse/SOLR-15246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301047#comment-17301047 ] 

David Smiley commented on SOLR-15246:
-------------------------------------

Wow, that's some slow highlighting!

Highlighting all stored fields {{hl.fl=*}} is suspicious.  Are you sure you want to do that?  It's unclear what fields you are actually searching on; maybe you should just be highlighting those.  Consider setting {{hl.requireFieldMatch=true}}

It would be interesting to isolate the performance impact of the underlying BreakIterator implementation from the rest of the Highlighter's job by choosing the most trivial implementation.  If you set {{hl.bs.type=SEPARATOR&hl.bs.separator=.}} then I'd be interested to see how much of a difference you see.  It's not a realistic setting because I'm sure there are more periods than sentences, and I think the highlights won't show the final period either... but it's something to compare.

If you shard your data more, you can do more highlighting in parallel.

> A unified highlighting search under solr 8.8.0/8.8.1 can take over 20 mins to run and eventually times out.
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-15246
>                 URL: https://issues.apache.org/jira/browse/SOLR-15246
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: highlighter
>    Affects Versions: 8.8, 8.8.1
>         Environment: I was running solr under windows
>            Reporter: Matthew Flowerday
>            Priority: Minor
>
> With solr 8.8.0 a new unified highlighting parameter &hl.fragAlignRatio was implemented which if not set defaults to 0.5. This attempts to improve the high lighting so that highlighted text does not appear right at the left. This works well but if you have a search result with numerous occurrences of the word in question within the record performance goes right down!
> 2021-02-27 06:45:03.151 INFO  (qtp762476028-20) [   x:uleaf] o.a.s.c.S.Request [uleaf]  webapp=/solr path=/select params=\{hl.snippets=2&q=test&hl=on&hl.maxAnalyzedChars=1000000&fl=id,description,specification,score&start=20&hl.fl=*&rows=10&_=1614405119134} hits=57008 status=0 QTime=1414320
> 2021-02-27 06:45:03.245 INFO  (qtp762476028-20) [   x:uleaf] o.a.s.s.HttpSolrCall Unable to write response, client closed connection or we are shutting down => org.eclipse.jetty.io.EofException
>               at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279)
> org.eclipse.jetty.io.EofException: null
>               at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279) ~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]
>               at org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:422) ~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]
>               at org.eclipse.jetty.io.WriteFlusher.completeWrite(WriteFlusher.java:378) ~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]
>  
> when I set &hl.fragAlignRatio=0.25 results came back much quicker
> 2021-02-27 14:59:57.189 INFO  (qtp1291367132-24) [   x:holmes] o.a.s.c.S.Request [holmes]  webapp=/solr path=/select params=\{hl.weightMatches=false&hl=on&fl=id,description,specification,score&start=1&hl.fragAlignRatio=0.25&rows=100&hl.snippets=2&q=test&hl.maxAnalyzedChars=1000000&hl.fl=*&hl.method=unified&timeAllowed=90000&_=1614430061690} hits=136939 status=0 QTime=87024
> And  &hl.fragAlignRatio=0.1
> 2021-02-27 15:18:45.542 INFO  (qtp1291367132-19) [   x:holmes] o.a.s.c.S.Request [holmes]  webapp=/solr path=/select params=\{hl.weightMatches=false&hl=on&fl=id,description,specification,score&start=1&hl.fragAlignRatio=0.1&rows=100&hl.snippets=2&q=test&hl.maxAnalyzedChars=1000000&hl.fl=*&hl.method=unified&timeAllowed=90000&_=1614430061690} hits=136939 status=0 QTime=69033
> And &hl.fragAlignRatio=0.0
> 2021-02-27 15:20:38.194 INFO  (qtp1291367132-24) [   x:holmes] o.a.s.c.S.Request [holmes]  webapp=/solr path=/select params=\{hl.weightMatches=false&hl=on&fl=id,description,specification,score&start=1&hl.fragAlignRatio=0.0&rows=100&hl.snippets=2&q=test&hl.maxAnalyzedChars=1000000&hl.fl=*&hl.method=unified&timeAllowed=90000&_=1614430061690} hits=136939 status=0 QTime=2841
> I left our setting at 0.0 – this presumably how it was in 7.7.1 (fully left aligned).  I am not too sure as to how many time a word has to occur in a record for performance to go right down – but if too many it can have a BIG impact.
> It might be an idea to set the default value to be say 0.25 instead of 0.5 so that people are not caught out.
> I also noticed that setting &timeAllowed=90000 did not break out of the query until it finished. Perhaps because the query finished quickly and what took the time was the highlighting. It might be an idea to get &timeAllowed to also cover any highlighting so that the query does not run until the jetty timeout is hit. The machine 100% one core for about 20 mins!.
> I raised this at the request of a member of the user forum.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)