You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Matt Hilt <ma...@numerica.us> on 2015/03/02 23:48:39 UTC

Slow highlighting on Solr 5.0.0

Short form:
While testing Solr 5.0.0 within our staging environment, I noticed that highlight enabled queries are much slower than I saw with 4.10. Are there any obvious reasons why this might be the case? As far as I can tell, nothing has changed with the default highlight search component or its parameters.


A little more detail:
The bulk of the collection config set was stolen from the basic 4.X example config set. I changed my schema.xml and solrconfig.xml just enough to get 5.0 to create a new collection (removed non-trie fields, some other deprecated response handler definitions, etc). I can provide my version of the solr.HighlightComponent config, but it is identical to the sample_techproducts_configs example in 5.0.  Are there any other config files I could provide that might be useful?


Number on “much slower”:
I indexed a very small subset of my data into the new collection and used the /select interface to do a simple debug query. Solr 4.10 gives the following pertinent info:
"response": {
    "numFound": 72628,
...
"debug": {
 "timing": {
      "time": 95,
      "process": {
        "time": 94,
        "query": {
          "time": 6
        },
        "highlight": {
          "time": 84
        },
        "debug": {
          "time": 4
        }
      }
---------------------------
Whereas solr 5.0 is:
 "response": {
    "numFound": 1093,
...
  "debug": {   
   "timing": {
      "time": 6551,
      "process": {
        "time": 6549,
        "query": {
          "time": 0
        },
        "highlight": {
          "time": 6524
        },
        "debug": {
          "time": 25
        }




Re: Slow highlighting on Solr 5.0.0

Posted by Ere Maijala <er...@helsinki.fi>.
Thanks for the pointers. Using hl.usePhraseHighlighter=false does indeed 
make it a lot faster. Obviously it's not really a solution, though, 
since in 4.10 it wasn't a problem and turning it off has consequences. 
I'm looking forward for the improvements in the next releases.

--Ere

8.5.2015, 19.06, Matt Hilt kirjoitti:
> I¹ve been looking into this again. The phrase highlighter is much slower
> than the default highlighter, so you might be able to add
> hl.usePhraseHighlighter=false to your query to make it faster. Note that
> web interface will NOT help here, because that param is true by default,
> and the checkbox is basically broken in that respect. Also, the default
> highlighter doesn¹t seem to work in all case the phrase highlighter does
> though.
>
> Also, the current development branch of 5x is much better than 5.1, but
> not as good as 4.10. This ticket seems to be hitting on some of the issues
> at hand:
> https://issues.apache.org/jira/browse/SOLR-5855
>
>
> I think this means they are getting there, but the performance is really
> still much worse than 4.10, and its not obvious why.
>
>
> On 5/5/15, 2:06 AM, "Ere Maijala" <er...@helsinki.fi> wrote:
>
>> I'm seeing the same with Solr 5.1.0 after upgrading from 4.10.2. Here
>> are my timings:
>>
>> 4.10.2:
>> process: 1432.0
>> highlight: 723.0
>>
>> 5.1.0:
>> process: 9570.0
>> highlight: 8790.0
>>
>> schema.xml and solrconfig.xml are available at
>> https://github.com/NatLibFi/NDL-VuFind-Solr/tree/master/vufind/biblio/conf
>> .
>>
>> A couple of jstack outputs taken when the query was executing are
>> available at http://pastebin.com/eJrEy2Wb
>>
>> Any suggestions would be appreciated. Or would it make sense to just
>> file a JIRA issue?
>>
>> --Ere
>>
>> 3.3.2015, 0.48, Matt Hilt kirjoitti:
>>> Short form:
>>> While testing Solr 5.0.0 within our staging environment, I noticed that
>>> highlight enabled queries are much slower than I saw with 4.10. Are
>>> there any obvious reasons why this might be the case? As far as I can
>>> tell, nothing has changed with the default highlight search component or
>>> its parameters.
>>>
>>>
>>> A little more detail:
>>> The bulk of the collection config set was stolen from the basic 4.X
>>> example config set. I changed my schema.xml and solrconfig.xml just
>>> enough to get 5.0 to create a new collection (removed non-trie fields,
>>> some other deprecated response handler definitions, etc). I can provide
>>> my version of the solr.HighlightComponent config, but it is identical to
>>> the sample_techproducts_configs example in 5.0.  Are there any other
>>> config files I could provide that might be useful?
>>>
>>>
>>> Number on ³much slower²:
>>> I indexed a very small subset of my data into the new collection and
>>> used the /select interface to do a simple debug query. Solr 4.10 gives
>>> the following pertinent info:
>>> "response": { "numFound": 72628,
>>> ...
>>> "debug": {
>>> "timing": { "time": 95, "process": { "time": 94, "query": { "time": 6 },
>>> "highlight": { "time": 84 }, "debug": { "time": 4 } }
>>> ---------------------------
>>> Whereas solr 5.0 is:
>>> "response": { "numFound": 1093,
>>> ...
>>> "debug": {
>>> "timing": { "time": 6551, "process": { "time": 6549, "query": { "time":
>>> 0 }, "highlight": { "time": 6524 }, "debug": { "time": 25 }
>>>
>>>
>>>
>>
>>
>> --
>> Ere Maijala
>> Kansalliskirjasto / The National Library of Finland


-- 
Ere Maijala
Kansalliskirjasto / The National Library of Finland

Re: Slow highlighting on Solr 5.0.0

Posted by Matt Hilt <ma...@numerica.us>.
I¹ve been looking into this again. The phrase highlighter is much slower
than the default highlighter, so you might be able to add
hl.usePhraseHighlighter=false to your query to make it faster. Note that
web interface will NOT help here, because that param is true by default,
and the checkbox is basically broken in that respect. Also, the default
highlighter doesn¹t seem to work in all case the phrase highlighter does
though. 

Also, the current development branch of 5x is much better than 5.1, but
not as good as 4.10. This ticket seems to be hitting on some of the issues
at hand:
https://issues.apache.org/jira/browse/SOLR-5855


I think this means they are getting there, but the performance is really
still much worse than 4.10, and its not obvious why.


On 5/5/15, 2:06 AM, "Ere Maijala" <er...@helsinki.fi> wrote:

>I'm seeing the same with Solr 5.1.0 after upgrading from 4.10.2. Here
>are my timings:
>
>4.10.2:
>process: 1432.0
>highlight: 723.0
>
>5.1.0:
>process: 9570.0
>highlight: 8790.0
>
>schema.xml and solrconfig.xml are available at
>https://github.com/NatLibFi/NDL-VuFind-Solr/tree/master/vufind/biblio/conf
>.
>
>A couple of jstack outputs taken when the query was executing are
>available at http://pastebin.com/eJrEy2Wb
>
>Any suggestions would be appreciated. Or would it make sense to just
>file a JIRA issue?
>
>--Ere
>
>3.3.2015, 0.48, Matt Hilt kirjoitti:
>> Short form:
>> While testing Solr 5.0.0 within our staging environment, I noticed that
>> highlight enabled queries are much slower than I saw with 4.10. Are
>> there any obvious reasons why this might be the case? As far as I can
>> tell, nothing has changed with the default highlight search component or
>> its parameters.
>>
>>
>> A little more detail:
>> The bulk of the collection config set was stolen from the basic 4.X
>> example config set. I changed my schema.xml and solrconfig.xml just
>> enough to get 5.0 to create a new collection (removed non-trie fields,
>> some other deprecated response handler definitions, etc). I can provide
>> my version of the solr.HighlightComponent config, but it is identical to
>> the sample_techproducts_configs example in 5.0.  Are there any other
>> config files I could provide that might be useful?
>>
>>
>> Number on ³much slower²:
>> I indexed a very small subset of my data into the new collection and
>> used the /select interface to do a simple debug query. Solr 4.10 gives
>> the following pertinent info:
>> "response": { "numFound": 72628,
>> ...
>> "debug": {
>> "timing": { "time": 95, "process": { "time": 94, "query": { "time": 6 },
>> "highlight": { "time": 84 }, "debug": { "time": 4 } }
>> ---------------------------
>> Whereas solr 5.0 is:
>> "response": { "numFound": 1093,
>> ...
>> "debug": {
>> "timing": { "time": 6551, "process": { "time": 6549, "query": { "time":
>> 0 }, "highlight": { "time": 6524 }, "debug": { "time": 25 }
>>
>>
>>
>
>
>-- 
>Ere Maijala
>Kansalliskirjasto / The National Library of Finland

Re: Slow highlighting on Solr 5.0.0

Posted by Ere Maijala <er...@helsinki.fi>.
I'm seeing the same with Solr 5.1.0 after upgrading from 4.10.2. Here 
are my timings:

4.10.2:
process: 1432.0
highlight: 723.0

5.1.0:
process: 9570.0
highlight: 8790.0

schema.xml and solrconfig.xml are available at 
https://github.com/NatLibFi/NDL-VuFind-Solr/tree/master/vufind/biblio/conf.

A couple of jstack outputs taken when the query was executing are 
available at http://pastebin.com/eJrEy2Wb

Any suggestions would be appreciated. Or would it make sense to just 
file a JIRA issue?

--Ere

3.3.2015, 0.48, Matt Hilt kirjoitti:
> Short form:
> While testing Solr 5.0.0 within our staging environment, I noticed that
> highlight enabled queries are much slower than I saw with 4.10. Are
> there any obvious reasons why this might be the case? As far as I can
> tell, nothing has changed with the default highlight search component or
> its parameters.
>
>
> A little more detail:
> The bulk of the collection config set was stolen from the basic 4.X
> example config set. I changed my schema.xml and solrconfig.xml just
> enough to get 5.0 to create a new collection (removed non-trie fields,
> some other deprecated response handler definitions, etc). I can provide
> my version of the solr.HighlightComponent config, but it is identical to
> the sample_techproducts_configs example in 5.0.  Are there any other
> config files I could provide that might be useful?
>
>
> Number on “much slower”:
> I indexed a very small subset of my data into the new collection and
> used the /select interface to do a simple debug query. Solr 4.10 gives
> the following pertinent info:
> "response": { "numFound": 72628,
> ...
> "debug": {
> "timing": { "time": 95, "process": { "time": 94, "query": { "time": 6 },
> "highlight": { "time": 84 }, "debug": { "time": 4 } }
> ---------------------------
> Whereas solr 5.0 is:
> "response": { "numFound": 1093,
> ...
> "debug": {
> "timing": { "time": 6551, "process": { "time": 6549, "query": { "time":
> 0 }, "highlight": { "time": 6524 }, "debug": { "time": 25 }
>
>
>


-- 
Ere Maijala
Kansalliskirjasto / The National Library of Finland