You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Kerwin <ke...@gmail.com> on 2021/01/27 07:19:40 UTC

Performance issue with Solr 8.6.1 Unified Highlighter does not occur on Solr 6.

Hi,

While upgrading to Solr 8 from 6 the Unified highlighter begins to have
performance issues going from approximately 100ms to more than 4 seconds
with 76 fields in the hl.q  and hl.fl parameters. So I played with
different options and found that the hl.q parameter needs to have any one
field for the performance issue to vanish. I do not know why this would be
so. Could you check if this is a bug or something else? This is not the
case if I use the original highlighter which has same performance on Solr 6
and Solr 8 of ~ 1.5 seconds. The highlighting payload is also mostly same
in all the cases.

Prior Solr 8 configuration with bad performance of > 4sec
<str name="hl.q">{!edismax qf="field1 field2 ..field76" v=$qq}</str>
<str name="hl.fl">field1 field2 ..field76</str>

Solr 8 configuration with original Solr 6 performance of ~ 100 ms
<str name="hl.q">{!edismax qf="field1" v=$qq}</str>
<str name="hl.fl">field1 field2 ..field76</str>

Other highlighting parameters
<str name="hl">true</str>
<str name="hl.method">unified</str>
<str name="hl.fragsize">200</str>
<str name="f.resume.content.hl.bs.type">WORD</str>
<str name="hl.bs.language">en</str>
<str name="hl.snippets">10</str>

If I remove the hl.q parameter altogether, the performance time shoots up
to 6-7 seconds, since our user query is quite large with more fields and is
more complicated, I suspect.

Re: Performance issue with Solr 8.6.1 Unified Highlighter does not occur on Solr 6.

Posted by Kerwin <ke...@gmail.com>.
 Hi David,

Thanks for filing this issue. The classic non-weightMatcher mode works well
for us right now. Yes, we are using the POSTINGS mode for most of the
fields although explicitly mentioning it gives an error since not all
fields are indexed with offsets. So I guess the highlighter is picking the
right choice for each field. Here is the test with hl.offsetSource=ANALYSIS
and hl.weightMatches=false that you requested.

hl.offsetSource=ANALYSIS&hl.weightMatches=false (340 ms)

The above is thus better than the original highlighter. I'll also try and
create that PR soon.

Re: Performance issue with Solr 8.6.1 Unified Highlighter does not occur on Solr 6.

Posted by David Smiley <ds...@apache.org>.
https://issues.apache.org/jira/browse/SOLR-10321 -- near the end my opinion
is we should just omit the field if there is no highlight, which would
address your need to do this work-around.  Glob or no glob.  PR welcome!

It's satisfying seeing that the Unified Highlighter is so much faster than
the original.  I aim to make UH the default in 9.0.  SOLR-12901
<https://issues.apache.org/jira/browse/SOLR-12901>

It's kinda depressing that the weightMatcher mode is slow when there are
many fields because I was hoping this choice might eventually be permanent
in order to obsolete lots of code in the highlighter.  I can guess why it's
slow -- and I filed an issue --
https://issues.apache.org/jira/browse/LUCENE-9712 -- a tough one!  Don't
expect anything from me there for the foreseeable future.  It'd take either
some ugly hack that has some limited qualifications, or a substantial
rewrite of much of the UH.  At least there's the classic non-weightMatcher
mode, which works faithfully, albeit with some of its own gotchas around
obscure/custom query compatibility.

You said the original highlighter performs at ~1.5 seconds.  For the UH, I
suspect your offset source is postings from the index to get such fantastic
numbers that you get with it; right?  For curiosity's sake, can you please
set hl.offsetSource=ANALYSIS and tell me what speed you get?  Set
hl.weightMatches=false as well.  My hope is that it's still substantially
better than the original highlighter.

Just because hl.requireFieldMatch=false is the default, doesn't mean it's
the _right_ choice for everyone's app :-).  I tend to think Solr should
flip this in 9.0 for both accuracy & performance sake.  And unset
hl.maxAnalyzedChars -- mostly an obsolete safety with the UH being so much
faster.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Fri, Jan 29, 2021 at 2:46 AM Kerwin <ke...@gmail.com> wrote:

> On another note, since response time is in question, I have been using a
> customhighlighter to just override the method encodeSnippets() in the
> UnifiedSolrHighlighter class since solr 6 since Solr sends back blank array
> (ZERO_LEN_STR_ARRAY) in the response payload for fields that do not match.
> Here is the code before:
> if (snippet == null) {
>           //TODO reuse logic of DefaultSolrHighlighter.alternateField
>           summary.add(field, ZERO_LEN_STR_ARRAY);
>         } ....
>
> So I had removed this clause and made the following change:
>
>         if (snippet != null) {
>        // we used a special snippet separator char and we can now split on
> it.
>           summary.add(field, snippet.split(SNIPPET_SEPARATOR));
>         }
>
> This has not changed in Solr 8 too, which for 76 fields gives a very large
> payload. So I will keep this custom code for now.
>
> On Fri, Jan 29, 2021 at 12:28 PM Kerwin <ke...@gmail.com> wrote:
>
>> Hi David,
>>
>> Thanks so much for your reply.
>> hl.weightMatches was indeed the culprit. After setting it to false, I am
>> now getting the same sub-second response as Solr 6. I am using Solr 8.6.1
>> (<luceneMatchVersion>8.6.1</luceneMatchVersion>)
>>
>> Here are the tests I carried out:
>> hl.requireFieldMatch=true&hl.weightMatches=true  (2458 ms)
>> hl.requireFieldMatch=false&hl.weightMatches=true (3964 ms)
>> hl.requireFieldMatch=true&hl.weightMatches=false (158 ms)
>> hl.requireFieldMatch=false&hl.weightMatches=false (169 ms) (CHOSEN since
>> this is consistent with our earlier setting).
>>
>> Thanks again, I will inform our other teams as well doing the Solr
>> upgrade to check the CHANGES.txt doc related to this.
>>
>

Re: Performance issue with Solr 8.6.1 Unified Highlighter does not occur on Solr 6.

Posted by Kerwin <ke...@gmail.com>.
On another note, since response time is in question, I have been using a
customhighlighter to just override the method encodeSnippets() in the
UnifiedSolrHighlighter class since solr 6 since Solr sends back blank array
(ZERO_LEN_STR_ARRAY) in the response payload for fields that do not match.
Here is the code before:
if (snippet == null) {
          //TODO reuse logic of DefaultSolrHighlighter.alternateField
          summary.add(field, ZERO_LEN_STR_ARRAY);
        } ....

So I had removed this clause and made the following change:

        if (snippet != null) {
       // we used a special snippet separator char and we can now split on
it.
          summary.add(field, snippet.split(SNIPPET_SEPARATOR));
        }

This has not changed in Solr 8 too, which for 76 fields gives a very large
payload. So I will keep this custom code for now.

On Fri, Jan 29, 2021 at 12:28 PM Kerwin <ke...@gmail.com> wrote:

> Hi David,
>
> Thanks so much for your reply.
> hl.weightMatches was indeed the culprit. After setting it to false, I am
> now getting the same sub-second response as Solr 6. I am using Solr 8.6.1
> (<luceneMatchVersion>8.6.1</luceneMatchVersion>)
>
> Here are the tests I carried out:
> hl.requireFieldMatch=true&hl.weightMatches=true  (2458 ms)
> hl.requireFieldMatch=false&hl.weightMatches=true (3964 ms)
> hl.requireFieldMatch=true&hl.weightMatches=false (158 ms)
> hl.requireFieldMatch=false&hl.weightMatches=false (169 ms) (CHOSEN since
> this is consistent with our earlier setting).
>
> Thanks again, I will inform our other teams as well doing the Solr upgrade
> to check the CHANGES.txt doc related to this.
>

Re: Performance issue with Solr 8.6.1 Unified Highlighter does not occur on Solr 6.

Posted by Kerwin <ke...@gmail.com>.
Hi David,

Thanks so much for your reply.
hl.weightMatches was indeed the culprit. After setting it to false, I am
now getting the same sub-second response as Solr 6. I am using Solr 8.6.1
(<luceneMatchVersion>8.6.1</luceneMatchVersion>)

Here are the tests I carried out:
hl.requireFieldMatch=true&hl.weightMatches=true  (2458 ms)
hl.requireFieldMatch=false&hl.weightMatches=true (3964 ms)
hl.requireFieldMatch=true&hl.weightMatches=false (158 ms)
hl.requireFieldMatch=false&hl.weightMatches=false (169 ms) (CHOSEN since
this is consistent with our earlier setting).

Thanks again, I will inform our other teams as well doing the Solr upgrade
to check the CHANGES.txt doc related to this.

Re: Performance issue with Solr 8.6.1 Unified Highlighter does not occur on Solr 6.

Posted by David Smiley <ds...@apache.org>.
Hello Kerwin,

Firstly, hopefully you've seen the upgrade notes:
https://lucene.apache.org/solr/guide/8_7/solr-upgrade-notes.html
8.6 fixes a performance regression found in 8.5; perhaps you are using 8.5?

Missing from the upgrade notes but found in the CHANGES.txt for 8.0
is hl.weightMatches=true is now the default.  Try setting it to false.
Does that help performance much?  It's documented on the highlighting page
of the ref guide:
https://lucene.apache.org/solr/guide/8_7/highlighting.html#the-unified-highlighter

You might want to try toggling hl.requireFieldMatch=true (defaults to
false).  For a query with dismax, it makes no semantic difference since all
clauses target all fields, unless users know how to query only specific
fields and do that.  It may impact performance significantly when there are
many fields.  Try a matrix of toggling this and hl.weightMatches (2x2=4
tests).

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Wed, Jan 27, 2021 at 2:20 AM Kerwin <ke...@gmail.com> wrote:

> Hi,
>
> While upgrading to Solr 8 from 6 the Unified highlighter begins to have
> performance issues going from approximately 100ms to more than 4 seconds
> with 76 fields in the hl.q  and hl.fl parameters. So I played with
> different options and found that the hl.q parameter needs to have any one
> field for the performance issue to vanish. I do not know why this would be
> so. Could you check if this is a bug or something else? This is not the
> case if I use the original highlighter which has same performance on Solr 6
> and Solr 8 of ~ 1.5 seconds. The highlighting payload is also mostly same
> in all the cases.
>
> Prior Solr 8 configuration with bad performance of > 4sec
> <str name="hl.q">{!edismax qf="field1 field2 ..field76" v=$qq}</str>
> <str name="hl.fl">field1 field2 ..field76</str>
>
> Solr 8 configuration with original Solr 6 performance of ~ 100 ms
> <str name="hl.q">{!edismax qf="field1" v=$qq}</str>
> <str name="hl.fl">field1 field2 ..field76</str>
>
> Other highlighting parameters
> <str name="hl">true</str>
> <str name="hl.method">unified</str>
> <str name="hl.fragsize">200</str>
> <str name="f.resume.content.hl.bs.type">WORD</str>
> <str name="hl.bs.language">en</str>
> <str name="hl.snippets">10</str>
>
> If I remove the hl.q parameter altogether, the performance time shoots up
> to 6-7 seconds, since our user query is quite large with more fields and is
> more complicated, I suspect.
>