You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "guenterh.lists@bluewin.ch" <gu...@bluewin.ch> on 2017/08/09 08:09:39 UTC

response time degradation with matchall queries / changin from SOLR 4.10 -> 6.x

Hi,
we are updating our SOLR infrastructure from version 4.10.2 to the latest 6.6. 
We realize a significant degradation of the response time while running match-all queries with facets (query in [1]) With version 4.x these kind of queries never took longer than 2000 ms.
Now all of these queries need more than 9000 ms. 
Our index [2] [3] contains around 30 Mio docs. Because we want to use doc-values for facets and sort functions we changed our doc-processing significantly replacing all text type with string fields.
The behavior of normal term queries is acceptable although it's a little bit slower compared with the current productive environment. Yesterday I run a couple of performance tests
I looked around and came across this (older) issue [4] which is partially related to our observations but actually I cannot find a solution for our behavior.
Did we miss something on the way of the development from version 4 / 5 / 6 which might be the reason for the degradation and we should change our queries?
Thanks a lot for any hints
Günter
[1] http://localhost:8080/solr/sb-biblio/select?rows=0&q=*:*&facet.field=union&facet.field=navAuthor_full&facet.field=format&facet.field=language&facet.field=navSub_green&facet.field=navSubform&facet.field=publishDate&qt=edismax&ps=2&hl=true&json.nl=arrarr&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&start=0&q.op=AND&sort=score+desc&rows=20&hl.simple.pre={{{{START_HILITE}}}}&facet.limit=100&hl.simple.post={{{{END_HILITE}}}}&spellcheck=false&qf=title_short^1000+title_alt^200+title_sub^200+title_old^200+title_new^200+author^750+author_additional^100+author_additional_dsv11_txt_mv^100+title_additional_dsv11_txt_mv^100+series^200+topic^500+addfields_txt_mv^50+publplace_txt_mv^25+publplace_dsv11_txt_mv^25+fulltext+callnumber^1000+ctrlnum^1000+publishDate+isbn+variant_isbn_isn_mv+issn+localcode+id&pf=title_short^1000&facet.mincount=1&hl.fl=fulltext&facet=true&wt=xml&facet.sort=count
[2] www.swissbib.ch
[3] http://search.swissbib.ch/solr/sb-biblio/select?q=*%3A*&wt=xml&indent=true
[4] https://issues.apache.org/jira/browse/SOLR-8251

Re: response time degradation with matchall queries / changin from SOLR 4.10 -> 6.x

Posted by Günter Hipler <gu...@unibas.ch>.
Hi Erik,

thanks for your reply. I made some deeper investigations to tackle the 
reason for the behavior but wasn't successful so far
Answer to your questions:
- yes I completely re-indexed the data
- yes I'm running a collection of around 5.000 queries coming from our 
productive logs

Now my current state of investigation:
1) a query on our current system (4.10) is using around 200 ms for 
processing facets on a larger resultset (here just one example)
http://search.swissbib.ch/solr/sb-biblio/select?debugQuery=true&q=*:*&indent=on&q.alt=*:*&ps=2&hl=true&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&q.op=AND&hl.simple.pre={{{{START_HILITE}}}}&qf=title_short^1000+title_alt^200+title^200+title_sub^200+title_old^200+title_new^200+author^750+author_additional^100+author_additional_dsv11_txt_mv^100+title_additional_dsv11_txt_mv^100+author_additional_gnd_txt_mv^100+title_additional_gnd_txt_mv^100+publplace_additional_gnd_txt_mv^100+series^200+topic^500+addfields_txt_mv^50+publplace_txt_mv^25+publplace_dsv11_txt_mv^25+fulltext+callnumber^1000+ctrlnum^1000+publishDate+isbn+cancisbn_isn_mv+variant_isbn_isn_mv+issn+incoissn_isn_mv+localcode+id&hl.fl=fulltext&wt=xml&mm=100%25&facet.field={!ex%3Dunion_filter}union&facet.field={!ex%3DnavAuthor_full_filter}navAuthor_full&facet.field={!ex%3Dformat_hierarchy_str_mv_filter}format_hierarchy_str_mv&facet.field={!ex%3Dlanguage_filter}language&facet.field=navSub_green&facet.field={!ex%3DnavSubform_filter}navSubform&facet.field=publishDate&qt=edismax&json.nl=arrarr&start=0&sort=score+desc&rows=0&facet.limit=100&hl.simple.post={{{{END_HILITE}}}}&spellcheck=false&pf=title_short^1000&facet.mincount=1&facet=true&facet.sort=count

while the same query on 6.x is using more than 4000 ms not uncommon more 
than 10000ms
https://gist.github.com/guenterh/8032bddd9bfce31324d1a8651b8d282b
(server is publicly not available)

2) I used several solr 6 versions (6.3 until 6.6) because other 
(library) networks running big indexes reported they too had faceting 
problems and one solved it with 6.3

3) I tried the way we built our old index schema (facet fields based on 
text types) as well as a schema with string fields for docvalues (the 
way we want to go in the future) but had the same problems

4) I played around with new possibilities of facet.methods 
(https://lucene.apache.org/solr/guide/6_6/faceting.html#Faceting-Thefacet.methodParameter 
- not available in version 4) but wasn't able to improve the results.

I have the impression something changed significantly in the way how 
facets are processed but unfortunately can't figure out how to make it 
that our use case isn't so badly affected as it is by now.

Thanks for hints!

Günter


On 09.08.2017 17:22, Erick Erickson wrote:
> Two questions:
>
> 1> did you completely re-index under 6x? My guess is "yes", since you
> jumped two major versions and 6x won't read a 4x index. If not you may
> be getting some performance degradation due to back-compat..
>
> 2> Try turning &debug=timing. that breaks down the time spent in each
> component and may give a clue, Highlighting has changed significantly
> so that's one place I'd look.
>
> And I'm assuming you're running a suite of tests, trying just a few
> queries is uninformative due to loading parts of the index into
> memory.
>
> Best,
> Erick
>
> On Wed, Aug 9, 2017 at 1:09 AM, guenterh.lists@bluewin.ch
> <gu...@bluewin.ch> wrote:
>> Hi,
>> we are updating our SOLR infrastructure from version 4.10.2 to the latest
>> 6.6.
>>
>> We realize a significant degradation of the response time while running
>> match-all queries with facets (query in [1]) With version 4.x these kind of
>> queries never took longer than 2000 ms.
>>
>> Now all of these queries need more than 9000 ms.
>>
>> Our index [2] [3] contains around 30 Mio docs. Because we want to use
>> doc-values for facets and sort functions we changed our doc-processing
>> significantly replacing all text type with string fields.
>>
>> The behavior of normal term queries is acceptable although it's a little bit
>> slower compared with the current productive environment. Yesterday I run a
>> couple of performance tests
>>
>> I looked around and came across this (older) issue [4] which is partially
>> related to our observations but actually I cannot find a solution for our
>> behavior.
>>
>> Did we miss something on the way of the development from version 4 / 5 / 6
>> which might be the reason for the degradation and we should change our
>> queries?
>>
>> Thanks a lot for any hints
>>
>> Günter
>>
>>
>>
>> [1]
>> http://localhost:8080/solr/sb-biblio/select?rows=0&q=*:*&facet.field=union&facet.field=navAuthor_full&facet.field=format&facet.field=language&facet.field=navSub_green&facet.field=navSubform&facet.field=publishDate&qt=edismax&ps=2&hl=true&json.nl=arrarr&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&start=0&q.op=AND&sort=score+desc&rows=20&hl.simple.pre={{{{START_HILITE}}}}&facet.limit=100&hl.simple.post={{{{END_HILITE}}}}&spellcheck=false&qf=title_short^1000+title_alt^200+title_sub^200+title_old^200+title_new^200+author^750+author_additional^100+author_additional_dsv11_txt_mv^100+title_additional_dsv11_txt_mv^100+series^200+topic^500+addfields_txt_mv^50+publplace_txt_mv^25+publplace_dsv11_txt_mv^25+fulltext+callnumber^1000+ctrlnum^1000+publishDate+isbn+variant_isbn_isn_mv+issn+localcode+id&pf=title_short^1000&facet.mincount=1&hl.fl=fulltext&facet=true&wt=xml&facet.sort=count
>>
>> [2] www.swissbib.ch
>> [3]
>> http://search.swissbib.ch/solr/sb-biblio/select?q=*%3A*&wt=xml&indent=true
>> [4] https://issues.apache.org/jira/browse/SOLR-8251

-- 
Universität Basel
Universitätsbibliothek
Günter Hipler
Projekt SwissBib
Schoenbeinstrasse 18-20
4056 Basel, Schweiz
Tel.: + 41 (0)61 267 31 12 Fax: ++41 61 267 3103
E-Mail guenter.hipler@unibas.ch
URL: www.swissbib.org  / http://www.ub.unibas.ch/



Re: response time degradation with matchall queries / changin from SOLR 4.10 -> 6.x

Posted by Günter Hipler <gu...@bluewin.ch>.
Hi Erik,

thanks for your reply. I made some deeper investigations to tackle the 
reason for the behavior but wasn't successful so far
Answer to your questions:
- yes I completely re-indexed the data
- yes I'm running a collection of around 5.000 queries coming from our 
productive logs

Now my current state of investigation:
1) a query on our current system (4.10) is using around 200 ms for 
processing facets on a larger resultset (here just one example)
http://search.swissbib.ch/solr/sb-biblio/select?debugQuery=true&q=*:*&indent=on&q.alt=*:*&ps=2&hl=true&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&q.op=AND&hl.simple.pre={{{{START_HILITE}}}}&qf=title_short^1000+title_alt^200+title^200+title_sub^200+title_old^200+title_new^200+author^750+author_additional^100+author_additional_dsv11_txt_mv^100+title_additional_dsv11_txt_mv^100+author_additional_gnd_txt_mv^100+title_additional_gnd_txt_mv^100+publplace_additional_gnd_txt_mv^100+series^200+topic^500+addfields_txt_mv^50+publplace_txt_mv^25+publplace_dsv11_txt_mv^25+fulltext+callnumber^1000+ctrlnum^1000+publishDate+isbn+cancisbn_isn_mv+variant_isbn_isn_mv+issn+incoissn_isn_mv+localcode+id&hl.fl=fulltext&wt=xml&mm=100%25&facet.field={!ex%3Dunion_filter}union&facet.field={!ex%3DnavAuthor_full_filter}navAuthor_full&facet.field={!ex%3Dformat_hierarchy_str_mv_filter}format_hierarchy_str_mv&facet.field={!ex%3Dlanguage_filter}language&facet.field=navSub_green&facet.field={!ex%3DnavSubform_filter}navSubform&facet.field=publishDate&qt=edismax&json.nl=arrarr&start=0&sort=score+desc&rows=0&facet.limit=100&hl.simple.post={{{{END_HILITE}}}}&spellcheck=false&pf=title_short^1000&facet.mincount=1&facet=true&facet.sort=count

while the same query on 6.x is using more than 4000 ms not uncommon more 
than 10000ms
https://gist.github.com/guenterh/8032bddd9bfce31324d1a8651b8d282b
(server is publicly not available)

2) I used several solr 6 versions (6.3 until 6.6) because other 
(library) networks running big indexes reported they too had faceting 
problems and one solved it with 6.3

3) I tried the way we built our old index schema (facet fields based on 
text types) as well as a schema with string fields for docvalues (the 
way we want to go in the future) but had the same problems

4) I played around with new possibilities of facet.methods 
(https://lucene.apache.org/solr/guide/6_6/faceting.html#Faceting-Thefacet.methodParameter 
- not available in version 4) but wasn't able to improve the results.

I have the impression something changed significantly in the way how 
facets are processed but unfortunately can't figure out how to make it 
that our use case isn't so badly affected as it is by now.

Thanks for hints!

Günter


On 09.08.2017 17:22, Erick Erickson wrote:
> Two questions:
>
> 1> did you completely re-index under 6x? My guess is "yes", since you
> jumped two major versions and 6x won't read a 4x index. If not you may
> be getting some performance degradation due to back-compat..
>
> 2> Try turning &debug=timing. that breaks down the time spent in each
> component and may give a clue, Highlighting has changed significantly
> so that's one place I'd look.
>
> And I'm assuming you're running a suite of tests, trying just a few
> queries is uninformative due to loading parts of the index into
> memory.
>
> Best,
> Erick
>
> On Wed, Aug 9, 2017 at 1:09 AM, guenterh.lists@bluewin.ch
> <gu...@bluewin.ch> wrote:
>> Hi,
>> we are updating our SOLR infrastructure from version 4.10.2 to the latest
>> 6.6.
>>
>> We realize a significant degradation of the response time while running
>> match-all queries with facets (query in [1]) With version 4.x these kind of
>> queries never took longer than 2000 ms.
>>
>> Now all of these queries need more than 9000 ms.
>>
>> Our index [2] [3] contains around 30 Mio docs. Because we want to use
>> doc-values for facets and sort functions we changed our doc-processing
>> significantly replacing all text type with string fields.
>>
>> The behavior of normal term queries is acceptable although it's a little bit
>> slower compared with the current productive environment. Yesterday I run a
>> couple of performance tests
>>
>> I looked around and came across this (older) issue [4] which is partially
>> related to our observations but actually I cannot find a solution for our
>> behavior.
>>
>> Did we miss something on the way of the development from version 4 / 5 / 6
>> which might be the reason for the degradation and we should change our
>> queries?
>>
>> Thanks a lot for any hints
>>
>> Günter
>>
>>
>>
>> [1]
>> http://localhost:8080/solr/sb-biblio/select?rows=0&q=*:*&facet.field=union&facet.field=navAuthor_full&facet.field=format&facet.field=language&facet.field=navSub_green&facet.field=navSubform&facet.field=publishDate&qt=edismax&ps=2&hl=true&json.nl=arrarr&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&start=0&q.op=AND&sort=score+desc&rows=20&hl.simple.pre={{{{START_HILITE}}}}&facet.limit=100&hl.simple.post={{{{END_HILITE}}}}&spellcheck=false&qf=title_short^1000+title_alt^200+title_sub^200+title_old^200+title_new^200+author^750+author_additional^100+author_additional_dsv11_txt_mv^100+title_additional_dsv11_txt_mv^100+series^200+topic^500+addfields_txt_mv^50+publplace_txt_mv^25+publplace_dsv11_txt_mv^25+fulltext+callnumber^1000+ctrlnum^1000+publishDate+isbn+variant_isbn_isn_mv+issn+localcode+id&pf=title_short^1000&facet.mincount=1&hl.fl=fulltext&facet=true&wt=xml&facet.sort=count
>>
>> [2] www.swissbib.ch
>> [3]
>> http://search.swissbib.ch/solr/sb-biblio/select?q=*%3A*&wt=xml&indent=true
>> [4] https://issues.apache.org/jira/browse/SOLR-8251


Re: response time degradation with matchall queries / changin from SOLR 4.10 -> 6.x

Posted by Erick Erickson <er...@gmail.com>.
Two questions:

1> did you completely re-index under 6x? My guess is "yes", since you
jumped two major versions and 6x won't read a 4x index. If not you may
be getting some performance degradation due to back-compat..

2> Try turning &debug=timing. that breaks down the time spent in each
component and may give a clue, Highlighting has changed significantly
so that's one place I'd look.

And I'm assuming you're running a suite of tests, trying just a few
queries is uninformative due to loading parts of the index into
memory.

Best,
Erick

On Wed, Aug 9, 2017 at 1:09 AM, guenterh.lists@bluewin.ch
<gu...@bluewin.ch> wrote:
> Hi,
> we are updating our SOLR infrastructure from version 4.10.2 to the latest
> 6.6.
>
> We realize a significant degradation of the response time while running
> match-all queries with facets (query in [1]) With version 4.x these kind of
> queries never took longer than 2000 ms.
>
> Now all of these queries need more than 9000 ms.
>
> Our index [2] [3] contains around 30 Mio docs. Because we want to use
> doc-values for facets and sort functions we changed our doc-processing
> significantly replacing all text type with string fields.
>
> The behavior of normal term queries is acceptable although it's a little bit
> slower compared with the current productive environment. Yesterday I run a
> couple of performance tests
>
> I looked around and came across this (older) issue [4] which is partially
> related to our observations but actually I cannot find a solution for our
> behavior.
>
> Did we miss something on the way of the development from version 4 / 5 / 6
> which might be the reason for the degradation and we should change our
> queries?
>
> Thanks a lot for any hints
>
> Günter
>
>
>
> [1]
> http://localhost:8080/solr/sb-biblio/select?rows=0&q=*:*&facet.field=union&facet.field=navAuthor_full&facet.field=format&facet.field=language&facet.field=navSub_green&facet.field=navSubform&facet.field=publishDate&qt=edismax&ps=2&hl=true&json.nl=arrarr&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&start=0&q.op=AND&sort=score+desc&rows=20&hl.simple.pre={{{{START_HILITE}}}}&facet.limit=100&hl.simple.post={{{{END_HILITE}}}}&spellcheck=false&qf=title_short^1000+title_alt^200+title_sub^200+title_old^200+title_new^200+author^750+author_additional^100+author_additional_dsv11_txt_mv^100+title_additional_dsv11_txt_mv^100+series^200+topic^500+addfields_txt_mv^50+publplace_txt_mv^25+publplace_dsv11_txt_mv^25+fulltext+callnumber^1000+ctrlnum^1000+publishDate+isbn+variant_isbn_isn_mv+issn+localcode+id&pf=title_short^1000&facet.mincount=1&hl.fl=fulltext&facet=true&wt=xml&facet.sort=count
>
> [2] www.swissbib.ch
> [3]
> http://search.swissbib.ch/solr/sb-biblio/select?q=*%3A*&wt=xml&indent=true
> [4] https://issues.apache.org/jira/browse/SOLR-8251