You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "guenterh.lists@bluewin.ch" <gu...@bluewin.ch> on 2017/08/21 13:35:05 UTC

facet processing module in Version 6.x needs significantly more time compared to version 4.10

Hi,
I can't figure out the reason why the facet processing in version 6 needs significantly more time compared to version 4.
The debugging response (for 30 million documents)
solr 4
<lst name="process"><double name="time">280.0</double><lst name="query"><double name="time">0.0</double></lst><lst name="facet"><double name="time">280.0</double></lst>
(once the query is cached)
before caching: between 1.5 and 2 sec
solr 6.x (my last try was with 6.6)
without docvalues for facetting fields (same schema as version 4)
<lst name="process"><double name="time">5874.0</double><lst name="query"><double name="time">0.0</double></lst><lst name="facet"><double name="time">5873.0</double></lst><lst name="facet_module"><double name="time">0.0</double></lst>
the time is not getting better even after repeating the query several times
solr 6.6 with docvalues for facetting fields
<lst name="process"><double name="time">9837.0</double><lst name="query"><double name="time">0.0</double></lst><lst name="facet"><double name="time">9837.0</double></lst><lst name="facet_module"><double name="time">0.0</double></lst>
used query (our productive system with version 4)
http://search.swissbib.ch/solr/sb-biblio/select?debugQuery=true&q=*:*&facet=true&facet.field=union&facet.field=navAuthor_full&facet.field=format&facet.field=language&facet.field=navSub_green&facet.field=navSubform&facet.field=publishDate&qt=edismax&ps=2&json.nl=arrarr&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&start=0&q.op=AND&sort=score+desc&rows=0&hl.simple.pre={{{{START_HILITE}}}}&facet.limit=100&hl.simple.post={{{{END_HILITE}}}}&spellcheck=false&qf=title_short^1000+title_alt^200+title_sub^200+title_old^200+title_new^200+author^750+author_additional^100+author_additional_dsv11_txt_mv^100+title_additional_dsv11_txt_mv^100+series^200+topic^500+addfields_txt_mv^50+publplace_txt_mv^25+publplace_dsv11_txt_mv^25+fulltext+callnumber^1000+ctrlnum^1000+publishDate+isbn+variant_isbn_isn_mv+issn+localcode+id&pf=title_short^1000&facet.mincount=1&hl.fl=fulltext&&wt=xml&facet.sort=count
Running the queries on smaller indices (8 million docs) the difference is similar although the absolut figures for processing time are smaller
Any hints why this huge differences?
Günter

Re: slow solr facet processing

Posted by Günter Hipler <gu...@unibas.ch>.
Yonik, thanks for the hint with the uif facet method.
(btw: why isn't it part of the official documentation? - at least I 
haven't found it)

For our use case it means:
Time for facet processing is exactly the same as it is with version 4. 
But this works only for indexes 'without' docvalues
I tested two indexes with 30 million docs which are exactly the same 
with one difference:
a) uses docvalues for faceting fields
b) no docvalues

both are multivalued

with a) I get faceting response times around 200ms
with b) 9000 ms

I'm really happy you re-started yesterday the discussion about 
https://issues.apache.org/jira/browse/SOLR-8096

I only can support the comment of Shawn Heisey:
"If I had any understanding of how this code worked and the precise 
reasons it has become slower, I would be working on a solution."

Although an old feature and perhaps the first well known feature of 
SOLR, faceting is the most important one.

Günter


On 31.08.2017 19:04, Yonik Seeley wrote:
> A possible improvement for some multiValued fields might be to use the
> "uif" facet method (UnInvertedField was the default method for
> multiValued fields in 4.x)
> I'm not sure if you would need to reindex without docValues on that
> field to try it though.
>
> Example: to enable on the "union" field, add f.union.facet.method=uif
>
> Support for this was added in https://issues.apache.org/jira/browse/SOLR-8466
>
> -Yonik
>
>
> On Thu, Aug 31, 2017 at 10:41 AM, Günter Hipler
> <gu...@unibas.ch> wrote:
>> Hi,
>>
>> in the meantime I came across the reason for the slow facet processing
>> capacities of SOLR since version 5.x
>>
>>   https://issues.apache.org/jira/browse/SOLR-8096
>> https://issues.apache.org/jira/browse/LUCENE-5666
>>
>> compared to version 4.x
>>
>> Various library networks across the world are suffering from the same
>> symptoms:
>>
>> Facet processing is one of the most important features of a search server
>> (for us) and it seems (at least IMHO) there is no solution for the issue
>> since March 2015 (release date for the last SOLR 4 version)
>>
>> What are the plans / ideas of the solr developers for a possible future
>> solution? Or maybe there is already a solution I haven't seen so far.
>>
>> Thanks for a feedback
>>
>> Günter
>>
>>
>>
>> On 21.08.2017 15:35, guenterh.lists@bluewin.ch wrote:
>>> Hi,
>>>
>>> I can't figure out the reason why the facet processing in version 6 needs
>>> significantly more time compared to version 4.
>>>
>>> The debugging response (for 30 million documents)
>>>
>>> solr 4
>>> <lst name="process"><double name="time">280.0</double><lst
>>> name="query"><double name="time">0.0</double></lst><lst name="facet"><double
>>> name="time">280.0</double></lst>
>>> (once the query is cached)
>>> before caching: between 1.5 and 2 sec
>>>
>>>
>>> solr 6.x (my last try was with 6.6)
>>> without docvalues for facetting fields (same schema as version 4)
>>> <lst name="process"><double name="time">5874.0</double><lst
>>> name="query"><double name="time">0.0</double></lst><lst name="facet"><double
>>> name="time">5873.0</double></lst><lst name="facet_module"><double
>>> name="time">0.0</double></lst>
>>> the time is not getting better even after repeating the query several
>>> times
>>>
>>>
>>> solr 6.6 with docvalues for facetting fields
>>> <lst name="process"><double name="time">9837.0</double><lst
>>> name="query"><double name="time">0.0</double></lst><lst name="facet"><double
>>> name="time">9837.0</double></lst><lst name="facet_module"><double
>>> name="time">0.0</double></lst>
>>>
>>> used query (our productive system with version 4)
>>>
>>> http://search.swissbib.ch/solr/sb-biblio/select?debugQuery=true&q=*:*&facet=true&facet.field=union&facet.field=navAuthor_full&facet.field=format&facet.field=language&facet.field=navSub_green&facet.field=navSubform&facet.field=publishDate&qt=edismax&ps=2&json.nl=arrarr&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&start=0&q.op=AND&sort=score+desc&rows=0&hl.simple.pre={{{{START_HILITE}}}}&facet.limit=100&hl.simple.post={{{{END_HILITE}}}}&spellcheck=false&qf=title_short^1000+title_alt^200+title_sub^200+title_old^200+title_new^200+author^750+author_additional^100+author_additional_dsv11_txt_mv^100+title_additional_dsv11_txt_mv^100+series^200+topic^500+addfields_txt_mv^50+publplace_txt_mv^25+publplace_dsv11_txt_mv^25+fulltext+callnumber^1000+ctrlnum^1000+publishDate+isbn+variant_isbn_isn_mv+issn+localcode+id&pf=title_short^1000&facet.mincount=1&hl.fl=fulltext&&wt=xml&facet.sort=count
>>>
>>>
>>> Running the queries on smaller indices (8 million docs) the difference is
>>> similar although the absolut figures for processing time are smaller
>>>
>>>
>>> Any hints why this huge differences?
>>>
>>> Günter
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>> --
>> Universität Basel
>> Universitätsbibliothek
>> Günter Hipler
>> Projekt SwissBib
>> Schoenbeinstrasse 18-20
>> 4056 Basel, Schweiz
>> Tel.: + 41 (0)61 267 31 12 Fax: ++41 61 267 3103
>> E-Mail guenter.hipler@unibas.ch
>> URL: www.swissbib.org  / http://www.ub.unibas.ch/
>>

-- 
Universität Basel
Universitätsbibliothek
Günter Hipler
Projekt SwissBib
Schoenbeinstrasse 18-20
4056 Basel, Schweiz
Tel.: + 41 (0)61 267 31 12 Fax: ++41 61 267 3103
E-Mail guenter.hipler@unibas.ch
URL: www.swissbib.org  / http://www.ub.unibas.ch/


Re: slow solr facet processing

Posted by Ere Maijala <er...@helsinki.fi>.
Toke Eskildsen kirjoitti 4.9.2017 klo 13.38:
> On Mon, 2017-09-04 at 13:21 +0300, Ere Maijala wrote:
>> Thanks for the insight, Yonik. I can confirm that #2 is true. I ran
>>
>> <optimize maxSegments="1" waitSearcher="true"/>
>>
>> and after it completed I was able to retrieve 2000 values in 17ms.
> 
> Very interesting. Is this on spinning disks or SSD? Is your index data
> cached in memory? What I am aiming at is if this is primarily a "many
> relatively slow random access"-thing or more due to the way DocValues
> are represented in the segments (the codec).

I indexed a few million new/changed records, and the performance is back 
to slow. Upside is that I can test again with a slow server.

It's spinning disks on a SAN, and the full index doesn't fit into 
memory. I don't see any IO wait, and repeated attempts are just as slow 
even though I would have thought the relevant parts would be cached in 
memory. During testing and reporting the results I've always discarded 
the very first requests since they're always slower than subsequent 
repeats due to there being another test index on the same server. Maybe 
worth noting is that while there's no IO wait, there is fairly high CPU 
usage for Solr's Java process hovering around 100% if I repeat the 
request in a loop.

I took a quick sample with VisualVM, and the top hotspots are:

org.apache.solr.search.facet.UnInvertedField.getCounts()	32.079956	7,356 
ms (32.1%)	7,356 ms	7,655 ms	7,655 ms
org.apache.lucene.util.PriorityQueue.downHeap()	30.232546	6,932 ms 
(30.2%)	6,932 ms	6,932 ms	6,932 ms
org.apache.lucene.index.MultiTermsEnum.pushTop()	11.628195	2,666 ms 
(11.6%)	2,666 ms	11,177 ms	11,177 ms
org.apache.lucene.index.MultiTermsEnum$TermMergeQueue.fillTop()	9.079571 
2,082 ms (9.1%)	2,082 ms	2,082 ms	2,082 ms
org.apache.lucene.store.ByteBufferGuard.getBytes()	4.176216	957 ms 
(4.2%)	957 ms	957 ms	957 ms
org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.next() 
2.6867974	616 ms (2.7%)	616 ms	616 ms	616 ms
org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermLeaf() 
1.393562	319 ms (1.4%)	319 ms	319 ms	319 ms
org.apache.lucene.util.fst.ByteSequenceOutputs.read()	1.2111844	277 ms 
(1.2%)	277 ms	277 ms	277 ms

(sorry if that looks bad in the email)

I'm building another index on a higher-end server that can load the full 
index to memory and will retest with that. But note that this index has 
docValues disabled as facet.method=uif seems to only cause trouble if 
docValues are enabled.

--Ere

-- 
Ere Maijala
Kansalliskirjasto / The National Library of Finland

Re: slow solr facet processing

Posted by Ere Maijala <er...@helsinki.fi>.
Yonik Seeley kirjoitti 4.9.2017 klo 18.03:
> It's due to this (see comments in UnInvertedField):
> *   To further save memory, the terms (the actual string values) are
> not all stored in
> *   memory, but a TermIndex is used to convert term numbers to term values only
> *   for the terms needed after faceting has completed.  Only every
> 128th term value
> *   is stored, along with its corresponding term number, and this is used as an
> *   index to find the closest term and iterate until the desired number is hit
> 
> There's probably a number of ways we can speed this up somewhat:
> - optimize how much memory is used to store the term index and use the
> savings to store more than every 128th term
> - store the terms contiguously in block(s)
> - don't store the whole term, only store what's needed to seek to the
> Nth term correctly
> - when retrieving many terms, sort them first and convert from ord->str in order

For what it's worth, I've now tested on our production servers that can 
hold the full index in memory, and the results are in line with the 
previous ones (47 million records, 1785 buckets in the tested facet):

1.) index with docValues="true":

- unoptimized: ~6000ms if facet.method is not specified
- unoptimized: ~7000ms with facet.method=uif
- optimized: ~7800ms if facet.method is not specified
- optimized: ~7700ms with facet.method=uif

Note that optimization took its time and other activity varies 
throughout the day, so the numbers between optimized and unoptimized 
cannot be directly compared. Still bugs me a bit that the optimized 
index seems to be a bit slower here.

2.) index with docValues="false":

- unoptimized: ~2600ms if facet.method is not specified
- unoptimized ~1200ms with facet.method=uif
- optimized: ~2600ms if facet.method is not specified
- optimized: ~17ms with facet.method=uif

--Ere

-- 
Ere Maijala
Kansalliskirjasto / The National Library of Finland

Re: slow solr facet processing

Posted by Yonik Seeley <ys...@gmail.com>.
The number-of-segments noise probably swamps this... but one
optimization around deep-facet-paging that didn't get carried forward
is
https://issues.apache.org/jira/browse/SOLR-2092

-Yonik


On Tue, Sep 5, 2017 at 6:49 AM, Toke Eskildsen <to...@kb.dk> wrote:
> On Mon, 2017-09-04 at 11:03 -0400, Yonik Seeley wrote:
>> It's due to this (see comments in UnInvertedField):
>
> I have read that. What I don't understand is the difference between 4.x
> and 6.x. But as you say, Ere seems to be in the process of verifying
> whether this is simply due to more segments in 6.x.
>
>> There's probably a number of ways we can speed this up somewhat:
>> - optimize how much memory is used to store the term index and use
>> the savings to store more than every 128th term
>> - store the terms contiguously in block(s)
>
> I'm considering taking a shot at that. A fairly easy optimization would
> be to replace the BytesRef[] indexedTermsArray with a BytesRefArray.
>
> - Toke Eskildsen, Royal Danish Library
>

Re: slow solr facet processing

Posted by Ere Maijala <er...@helsinki.fi>.
Toke Eskildsen kirjoitti 5.9.2017 klo 13.49:
> On Mon, 2017-09-04 at 11:03 -0400, Yonik Seeley wrote:
>> It's due to this (see comments in UnInvertedField):
> 
> I have read that. What I don't understand is the difference between 4.x
> and 6.x. But as you say, Ere seems to be in the process of verifying
> whether this is simply due to more segments in 6.x.

During my testing I never optimized the 4.x index, so unless it 
maintains a minimal number of segments automatically, there's something 
else too.

>> There's probably a number of ways we can speed this up somewhat:
>> - optimize how much memory is used to store the term index and use
>> the savings to store more than every 128th term
>> - store the terms contiguously in block(s)
> 
> I'm considering taking a shot at that. A fairly easy optimization would
> be to replace the BytesRef[] indexedTermsArray with a BytesRefArray.

I'd be happy to try out any patches.. :)

--Ere

-- 
Ere Maijala
Kansalliskirjasto / The National Library of Finland

Re: slow solr facet processing

Posted by Toke Eskildsen <to...@kb.dk>.
On Mon, 2017-09-04 at 11:03 -0400, Yonik Seeley wrote:
> It's due to this (see comments in UnInvertedField):

I have read that. What I don't understand is the difference between 4.x
and 6.x. But as you say, Ere seems to be in the process of verifying
whether this is simply due to more segments in 6.x.

> There's probably a number of ways we can speed this up somewhat:
> - optimize how much memory is used to store the term index and use
> the savings to store more than every 128th term
> - store the terms contiguously in block(s)

I'm considering taking a shot at that. A fairly easy optimization would
be to replace the BytesRef[] indexedTermsArray with a BytesRefArray.

- Toke Eskildsen, Royal Danish Library


Re: slow solr facet processing

Posted by Yonik Seeley <ys...@gmail.com>.
On Mon, Sep 4, 2017 at 6:38 AM, Toke Eskildsen <to...@kb.dk> wrote:
> On Mon, 2017-09-04 at 13:21 +0300, Ere Maijala wrote:
>> Thanks for the insight, Yonik. I can confirm that #2 is true. I ran
>>
>> <optimize maxSegments="1" waitSearcher="true"/>
>>
>> and after it completed I was able to retrieve 2000 values in 17ms.
>
> Very interesting. Is this on spinning disks or SSD? Is your index data
> cached in memory? What I am aiming at is if this is primarily a "many
> relatively slow random access"-thing or more due to the way DocValues
> are represented in the segments (the codec).

It's due to this (see comments in UnInvertedField):
*   To further save memory, the terms (the actual string values) are
not all stored in
*   memory, but a TermIndex is used to convert term numbers to term values only
*   for the terms needed after faceting has completed.  Only every
128th term value
*   is stored, along with its corresponding term number, and this is used as an
*   index to find the closest term and iterate until the desired number is hit

There's probably a number of ways we can speed this up somewhat:
- optimize how much memory is used to store the term index and use the
savings to store more than every 128th term
- store the terms contiguously in block(s)
- don't store the whole term, only store what's needed to seek to the
Nth term correctly
- when retrieving many terms, sort them first and convert from ord->str in order

-Yonik

Re: slow solr facet processing

Posted by Toke Eskildsen <to...@kb.dk>.
On Mon, 2017-09-04 at 13:21 +0300, Ere Maijala wrote:
> Thanks for the insight, Yonik. I can confirm that #2 is true. I ran
> 
> <optimize maxSegments="1" waitSearcher="true"/>
> 
> and after it completed I was able to retrieve 2000 values in 17ms.

Very interesting. Is this on spinning disks or SSD? Is your index data
cached in memory? What I am aiming at is if this is primarily a "many
relatively slow random access"-thing or more due to the way DocValues
are represented in the segments (the codec).

- Toke Eskildsen, Royal Danish Library


Re: slow solr facet processing

Posted by Erick Erickson <er...@gmail.com>.
Ere:

This is an excellent summary, it conforms to what I think I know, it's
always nice to see confirmation!

I'd add two small enhancements. Your point 5 mentions sorting. The same
consideration is true for grouping and faceting as well. What all three
have in common is that they answer the question "for document X, what is
the value(s) of field Y?" which inverted indexes don't handle efficiently.

The second enhancement is an additional caveat for point 7. Not only are
multivalued fields returned in sorted order, if multiple identical values
exist they are collapsed into one (it's a sorted set). So input of
4,5,3,3,3 would return just 3,4,5.

About point 8. I'm thinking lately that a better option would be to use
some of the enhanced metrics being put in place for autoscaling to
intelligently route queries (and sub-queries) to nodes that currently have
the most unused (CPU?) capacity. The problem I have with routing only to
PULL or TLOG replicas is that your NRT replicas can sit idle; it's a crude
hammer. Consider periodic bulk indexing, i.e. your index changes once every
hour for 10 minutes. Your NRT nodes would sit idle for 50 minutes/hour. I'd
rather see effort there than specifying a subset of nodes as search nodes,
WDYT?

Again, thanks for taking the time to write up your summary!
Erick


On Fri, Jan 5, 2018 at 6:15 AM, Ere Maijala <er...@helsinki.fi> wrote:

> Hi Everyone,
>
> This is a followup on the discussion from September 2017. Since then I've
> spent a lot of time gathering a better understanding on docValues compared
> to UIF and other stuff related to Solr performance. Here's a summary of the
> results based on my real-world experience:
>
> 1. Making sure Solr needs as little Java heap as possible is crucial.
>
> 2. UIF requires a lot of Java heap. With a larger index it becomes
> impractical, since Java GC can't easily keep up with the heaps required.
>
> 3. UIF is really fast, but only after serious warmup. DocValues work
> better if the index is updated regularly, since same level of warmup is not
> needed.
>
> 4. DocValues, taking advantage of memory-mapped files, don't have the
> above problem, and after moving to all-docValues we have been able to
> reduce the Java heap from 31G to 6G. This is pretty significant, since it
> means we don't have to deal with long GC pauses.
>
> 5. Make sure docValues are enabled also for all fields used for sorting.
> This helps avoid spending memory on field cache. Without docValues we could
> easily have 2 GB of field cache entries.
>
> 5. It seems that having docValues for the id field is useful too. For now
> stored needs to remain true too (see https://issues.apache.org/jira
> /browse/SOLR-10816).
>
> 6. Sharding the index helps faceting with docValues perform more work in
> parallel and results in a lot better performance. This doesn't seem to
> negatively affect the overall performance (at least enough to be
> perceived), and it seems that splitting our index to three shards resulted
> in speedup that's better than previous performance divided by three. There
> is a caveat [1], though.
>
> 7. In many cases fields that have docValues enabled can be switched from
> stored="true" to stored="false" since Solr can fetch the contents from
> docValues. A notable exception is multivalued fields where the order of the
> values is important. This means that enabling docValues doesn't add to the
> index size significantly.
>
> 8. Different replica types available in Solr 7 are really useful in
> reducing the CPU time spent indexing records. I'd still like to have a way
> to have PULL replicas with NRT replicas so that only the PULL replicas
> handle search queries.
>
> 9. Lastly, a lot can be done on the application level. For instance in our
> case many users don't care about facets or only use a couple of them, so we
> fetch them asynchronously as needed and collapse most by default without
> fetching them at all. This lowers the server load significantly (I'll work
> on contributing the option to upstream VuFind).
>
>
> I hope this helps others make informed choices.
>
> --Ere
>
>
> [1] Care must be taken to avoid requests that cause Solr to fetch a lot of
> rows at once from each shard, since that blows up the memory usage wreaking
> havoc in Solr. One particular case that, at first sight, doesn't look too
> dangerous, is deep paging without a cursor (Yonik has a good explanation of
> this at http://yonik.com/solr/paging-and-deep-paging/).
>

Re: slow solr facet processing

Posted by Ere Maijala <er...@helsinki.fi>.
Hi Everyone,

This is a followup on the discussion from September 2017. Since then 
I've spent a lot of time gathering a better understanding on docValues 
compared to UIF and other stuff related to Solr performance. Here's a 
summary of the results based on my real-world experience:

1. Making sure Solr needs as little Java heap as possible is crucial.

2. UIF requires a lot of Java heap. With a larger index it becomes 
impractical, since Java GC can't easily keep up with the heaps required.

3. UIF is really fast, but only after serious warmup. DocValues work 
better if the index is updated regularly, since same level of warmup is 
not needed.

4. DocValues, taking advantage of memory-mapped files, don't have the 
above problem, and after moving to all-docValues we have been able to 
reduce the Java heap from 31G to 6G. This is pretty significant, since 
it means we don't have to deal with long GC pauses.

5. Make sure docValues are enabled also for all fields used for sorting. 
This helps avoid spending memory on field cache. Without docValues we 
could easily have 2 GB of field cache entries.

5. It seems that having docValues for the id field is useful too. For 
now stored needs to remain true too (see 
https://issues.apache.org/jira/browse/SOLR-10816).

6. Sharding the index helps faceting with docValues perform more work in 
parallel and results in a lot better performance. This doesn't seem to 
negatively affect the overall performance (at least enough to be 
perceived), and it seems that splitting our index to three shards 
resulted in speedup that's better than previous performance divided by 
three. There is a caveat [1], though.

7. In many cases fields that have docValues enabled can be switched from 
stored="true" to stored="false" since Solr can fetch the contents from 
docValues. A notable exception is multivalued fields where the order of 
the values is important. This means that enabling docValues doesn't add 
to the index size significantly.

8. Different replica types available in Solr 7 are really useful in 
reducing the CPU time spent indexing records. I'd still like to have a 
way to have PULL replicas with NRT replicas so that only the PULL 
replicas handle search queries.

9. Lastly, a lot can be done on the application level. For instance in 
our case many users don't care about facets or only use a couple of 
them, so we fetch them asynchronously as needed and collapse most by 
default without fetching them at all. This lowers the server load 
significantly (I'll work on contributing the option to upstream VuFind).


I hope this helps others make informed choices.

--Ere


[1] Care must be taken to avoid requests that cause Solr to fetch a lot 
of rows at once from each shard, since that blows up the memory usage 
wreaking havoc in Solr. One particular case that, at first sight, 
doesn't look too dangerous, is deep paging without a cursor (Yonik has a 
good explanation of this at http://yonik.com/solr/paging-and-deep-paging/).

Re: slow solr facet processing

Posted by Ere Maijala <er...@helsinki.fi>.
Yonik Seeley kirjoitti 1.9.2017 klo 17.03:> On Fri, Sep 1, 2017 at 9:17 
AM, Ere Maijala <er...@helsinki.fi> wrote:
 >> I spoke a bit too soon. Now I see why I didn't see any improvement from
 >> facet.method=uif before: its performance seems to depend heavily on 
how many
 >> facets are returned. With an index of 6 million records and the 
facet having
 >> 1960 buckets:
 >>
 >> facet.limit=20 takes 4ms
 >> facet.limit=200 takes ~100ms
 >> facet.limit=2000 takes ~1300ms
 >>
 >> So, for some uses it provides a nice boost, but if you need to fetch 
more
 >> than a few top items, it doesn't perform properly.
 >
 > Another thought on this one:
 > If it does slow down more than 4.x when requesting many items, it's 
either
 > 1) a bug introduced at some point
 > 2) not actually slower, but due to the 6.6 index having more segments
 > (ord->string conversion needs to merge multiple term enumerators, so
 > more segments == slower)
 >
 > If you could check #2, that would be great!  If it doesn't seem to be
 > the problem, could you open up a new JIRA issue for this?
 >
Thanks for the insight, Yonik. I can confirm that #2 is true. I ran

<optimize maxSegments="1" waitSearcher="true"/>

and after it completed I was able to retrieve 2000 values in 17ms.

Does this mean we should have a very aggressive merge policy? That's 
something I haven't tweaked, and it's not quite clear to me what would 
be the best way to achieve consistently low number of segments.

I encountered one issue with some further testing. I assume this is a 
bug: Trying to use facet.method=uif with a solr.DateRangeField causes 
the following exception:

2017-09-04 12:50:33.246 ERROR (qtp1205044462-18602) [   x:biblio2] 
o.a.s.s.HttpSolrCall null:org.apache.solr.common.SolrException: 
Exception during facet.field: search_daterange_mv
         at 
org.apache.solr.request.SimpleFacets.lambda$getFacetFieldCounts$0(SimpleFacets.java:809)
         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
         at 
org.apache.solr.request.SimpleFacets$3.execute(SimpleFacets.java:742)
         at 
org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:818)
         at 
org.apache.solr.handler.component.FacetComponent.getFacetCounts(FacetComponent.java:326)
         at 
org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:274)
         at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:304)
         at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
         at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
         at 
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
         at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
         at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
         at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
         at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
         at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
         at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
         at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
         at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
         at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
         at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
         at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
         at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
         at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
         at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
         at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
         at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
         at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
         at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
         at org.eclipse.jetty.server.Server.handle(Server.java:534)
         at 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
         at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
         at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
         at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
         at 
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
         at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
         at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
         at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
         at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
         at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
         at java.lang.Thread.run(Thread.java:748)

Caused by: java.lang.IllegalStateException: instead call createFields() 
because isPolyField() is true
         at 
org.apache.solr.schema.AbstractSpatialFieldType.createField(AbstractSpatialFieldType.java:204)
         at 
org.apache.solr.schema.AbstractSpatialFieldType.createField(AbstractSpatialFieldType.java:73)
         at org.apache.solr.schema.FieldType.toObject(FieldType.java:385)
         at 
org.apache.solr.search.facet.FacetFieldProcessorByArray.lambda$calcFacets$0(FacetFieldProcessorByArray.java:113)
         at 
org.apache.solr.search.facet.FacetFieldProcessor.findTopSlots(FacetFieldProcessor.java:333)
         at 
org.apache.solr.search.facet.FacetFieldProcessorByArray.calcFacets(FacetFieldProcessorByArray.java:110)
         at 
org.apache.solr.search.facet.FacetFieldProcessorByArray.process(FacetFieldProcessorByArray.java:58)
         at 
org.apache.solr.search.facet.FacetProcessor.processSubs(FacetProcessor.java:460)
         at 
org.apache.solr.search.facet.FacetProcessor.fillBucket(FacetProcessor.java:407)
         at 
org.apache.solr.search.facet.FacetQueryProcessor.process(FacetQuery.java:64)
         at 
org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:544)
         at 
org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:405)
         at 
org.apache.solr.request.SimpleFacets.lambda$getFacetFieldCounts$0(SimpleFacets.java:803)

--Ere

 > -Yonik
 >
 >
 >> Query used was:
 >>
 >> 
q=*:*&rows=0&facet=true&facet.field=building&facet.mincount=1&facet.limit=2000&debugQuery=true&facet.method=uif
 >>
 >> --Ere
 >>
 >>
 >> Ere Maijala kirjoitti 1.9.2017 klo 13.10:
 >>>
 >>> I can confirm that we're seeing the same issue as Günter. For a 
collection
 >>> of 57 million bibliographic records, Solr 4.10.2 (without 
docValues) can
 >>> consistently return a facet in about 20ms, while Solr 6.6.0 with 
docValues
 >>> takes around 2600ms. I've tested some versions between those two 
too, but I
 >>> don't have comparable numbers for them.
 >>>
 >>> I thought I had tried all different combinations of 
docValues="true/false"
 >>> and facet.method=fc/uif/enum, but now that I checked it again, it 
seems that
 >>> I may have missed a test, as an 6.6.0 index with docValues="false" and
 >>> facet.method=uif is markedly faster than other methods. At around 
700ms it's
 >>> still not nowhere near as fast as 4.10.2, but a whole lot better. 
It seems
 >>> that docValues needs to be disabled for facet.method=uif to have effect
 >>> though, which is unfortunate. Otherwise it reports that applied 
method is
 >>> UIF, but the performance is actually much worse than with FC. I'll 
do just
 >>> another round of testing to verify all this. I can report to 
SOLR-8096 when
 >>> I have something conclusive.
 >>>
 >>> --Ere
 >>>
 >>> Yonik Seeley kirjoitti 31.8.2017 klo 20.04:
 >>>>
 >>>> A possible improvement for some multiValued fields might be to use the
 >>>> "uif" facet method (UnInvertedField was the default method for
 >>>> multiValued fields in 4.x)
 >>>> I'm not sure if you would need to reindex without docValues on that
 >>>> field to try it though.
 >>>>
 >>>> Example: to enable on the "union" field, add f.union.facet.method=uif
 >>>>
 >>>> Support for this was added in
 >>>> https://issues.apache.org/jira/browse/SOLR-8466
 >>>>
 >>>> -Yonik
 >>>>
 >>>>
 >>>> On Thu, Aug 31, 2017 at 10:41 AM, Günter Hipler
 >>>> <gu...@unibas.ch> wrote:
 >>>>>
 >>>>> Hi,
 >>>>>
 >>>>> in the meantime I came across the reason for the slow facet 
processing
 >>>>> capacities of SOLR since version 5.x
 >>>>>
 >>>>>    https://issues.apache.org/jira/browse/SOLR-8096
 >>>>> https://issues.apache.org/jira/browse/LUCENE-5666
 >>>>>
 >>>>> compared to version 4.x
 >>>>>
 >>>>> Various library networks across the world are suffering from the same
 >>>>> symptoms:
 >>>>>
 >>>>> Facet processing is one of the most important features of a search
 >>>>> server
 >>>>> (for us) and it seems (at least IMHO) there is no solution for 
the issue
 >>>>> since March 2015 (release date for the last SOLR 4 version)
 >>>>>
 >>>>> What are the plans / ideas of the solr developers for a possible 
future
 >>>>> solution? Or maybe there is already a solution I haven't seen so far.
 >>>>>
 >>>>> Thanks for a feedback
 >>>>>
 >>>>> Günter
 >>>>>
 >>>>>
 >>>>>
 >>>>> On 21.08.2017 15:35, guenterh.lists@bluewin.ch wrote:
 >>>>>>
 >>>>>>
 >>>>>> Hi,
 >>>>>>
 >>>>>> I can't figure out the reason why the facet processing in version 6
 >>>>>> needs
 >>>>>> significantly more time compared to version 4.
 >>>>>>
 >>>>>> The debugging response (for 30 million documents)
 >>>>>>
 >>>>>> solr 4
 >>>>>> <lst name="process"><double name="time">280.0</double><lst
 >>>>>> name="query"><double name="time">0.0</double></lst><lst
 >>>>>> name="facet"><double
 >>>>>> name="time">280.0</double></lst>
 >>>>>> (once the query is cached)
 >>>>>> before caching: between 1.5 and 2 sec
 >>>>>>
 >>>>>>
 >>>>>> solr 6.x (my last try was with 6.6)
 >>>>>> without docvalues for facetting fields (same schema as version 4)
 >>>>>> <lst name="process"><double name="time">5874.0</double><lst
 >>>>>> name="query"><double name="time">0.0</double></lst><lst
 >>>>>> name="facet"><double
 >>>>>> name="time">5873.0</double></lst><lst name="facet_module"><double
 >>>>>> name="time">0.0</double></lst>
 >>>>>> the time is not getting better even after repeating the query 
several
 >>>>>> times
 >>>>>>
 >>>>>>
 >>>>>> solr 6.6 with docvalues for facetting fields
 >>>>>> <lst name="process"><double name="time">9837.0</double><lst
 >>>>>> name="query"><double name="time">0.0</double></lst><lst
 >>>>>> name="facet"><double
 >>>>>> name="time">9837.0</double></lst><lst name="facet_module"><double
 >>>>>> name="time">0.0</double></lst>
 >>>>>>
 >>>>>> used query (our productive system with version 4)
 >>>>>>
 >>>>>>
 >>>>>> 
http://search.swissbib.ch/solr/sb-biblio/select?debugQuery=true&q=*:*&facet=true&facet.field=union&facet.field=navAuthor_full&facet.field=format&facet.field=language&facet.field=navSub_green&facet.field=navSubform&facet.field=publishDate&qt=edismax&ps=2&json.nl=arrarr&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&start=0&q.op=AND&sort=score+desc&rows=0&hl.simple.pre={{{{START_HILITE}}}}&facet.limit=100&hl.simple.post={{{{END_HILITE}}}}&spellcheck=false&qf=title_short^1000+title_alt^200+title_sub^200+title_old^200+title_new^200+author^750+author_additional^100+author_additional_dsv11_txt_mv^100+title_additional_dsv11_txt_mv^100+series^200+topic^500+addfields_txt_mv^50+publplace_txt_mv^25+publplace_dsv11_txt_mv^25+fulltext+callnumber^1000+ctrlnum^1000+publishDate+isbn+variant_isbn_isn_mv+issn+localcode+id&pf=title_short^1000&facet.mincount=1&hl.fl=fulltext&&wt=xml&facet.sort=count
 >>>>>>
 >>>>>>
 >>>>>> Running the queries on smaller indices (8 million docs) the 
difference
 >>>>>> is
 >>>>>> similar although the absolut figures for processing time are smaller
 >>>>>>
 >>>>>>
 >>>>>> Any hints why this huge differences?
 >>>>>>
 >>>>>> Günter
 >>>>>>
 >>>>>>
 >>>>>>
 >>>>>>
 >>>>>>
 >>>>>>
 >>>>>>
 >>>>>>
 >>>>>>
 >>>>>
 >>>>> --
 >>>>> Universität Basel
 >>>>> Universitätsbibliothek
 >>>>> Günter Hipler
 >>>>> Projekt SwissBib
 >>>>> Schoenbeinstrasse 18-20
 >>>>> 4056 Basel, Schweiz
 >>>>> Tel.: + 41 (0)61 267 31 12 Fax: ++41 61 267 3103
 >>>>> E-Mail guenter.hipler@unibas.ch
 >>>>> URL: www.swissbib.org  / http://www.ub.unibas.ch/
 >>>>>
 >>>
 >>
 >> --
 >> Ere Maijala
 >> Kansalliskirjasto / The National Library of Finland
-- 
Ere Maijala
Kansalliskirjasto / The National Library of Finland

Re: slow solr facet processing

Posted by Yonik Seeley <ys...@gmail.com>.
On Fri, Sep 1, 2017 at 9:17 AM, Ere Maijala <er...@helsinki.fi> wrote:
> I spoke a bit too soon. Now I see why I didn't see any improvement from
> facet.method=uif before: its performance seems to depend heavily on how many
> facets are returned. With an index of 6 million records and the facet having
> 1960 buckets:
>
> facet.limit=20 takes 4ms
> facet.limit=200 takes ~100ms
> facet.limit=2000 takes ~1300ms
>
> So, for some uses it provides a nice boost, but if you need to fetch more
> than a few top items, it doesn't perform properly.

Another thought on this one:
If it does slow down more than 4.x when requesting many items, it's either
1) a bug introduced at some point
2) not actually slower, but due to the 6.6 index having more segments
(ord->string conversion needs to merge multiple term enumerators, so
more segments == slower)

If you could check #2, that would be great!  If it doesn't seem to be
the problem, could you open up a new JIRA issue for this?

-Yonik


> Query used was:
>
> q=*:*&rows=0&facet=true&facet.field=building&facet.mincount=1&facet.limit=2000&debugQuery=true&facet.method=uif
>
> --Ere
>
>
> Ere Maijala kirjoitti 1.9.2017 klo 13.10:
>>
>> I can confirm that we're seeing the same issue as Günter. For a collection
>> of 57 million bibliographic records, Solr 4.10.2 (without docValues) can
>> consistently return a facet in about 20ms, while Solr 6.6.0 with docValues
>> takes around 2600ms. I've tested some versions between those two too, but I
>> don't have comparable numbers for them.
>>
>> I thought I had tried all different combinations of docValues="true/false"
>> and facet.method=fc/uif/enum, but now that I checked it again, it seems that
>> I may have missed a test, as an 6.6.0 index with docValues="false" and
>> facet.method=uif is markedly faster than other methods. At around 700ms it's
>> still not nowhere near as fast as 4.10.2, but a whole lot better. It seems
>> that docValues needs to be disabled for facet.method=uif to have effect
>> though, which is unfortunate. Otherwise it reports that applied method is
>> UIF, but the performance is actually much worse than with FC. I'll do just
>> another round of testing to verify all this. I can report to SOLR-8096 when
>> I have something conclusive.
>>
>> --Ere
>>
>> Yonik Seeley kirjoitti 31.8.2017 klo 20.04:
>>>
>>> A possible improvement for some multiValued fields might be to use the
>>> "uif" facet method (UnInvertedField was the default method for
>>> multiValued fields in 4.x)
>>> I'm not sure if you would need to reindex without docValues on that
>>> field to try it though.
>>>
>>> Example: to enable on the "union" field, add f.union.facet.method=uif
>>>
>>> Support for this was added in
>>> https://issues.apache.org/jira/browse/SOLR-8466
>>>
>>> -Yonik
>>>
>>>
>>> On Thu, Aug 31, 2017 at 10:41 AM, Günter Hipler
>>> <gu...@unibas.ch> wrote:
>>>>
>>>> Hi,
>>>>
>>>> in the meantime I came across the reason for the slow facet processing
>>>> capacities of SOLR since version 5.x
>>>>
>>>>   https://issues.apache.org/jira/browse/SOLR-8096
>>>> https://issues.apache.org/jira/browse/LUCENE-5666
>>>>
>>>> compared to version 4.x
>>>>
>>>> Various library networks across the world are suffering from the same
>>>> symptoms:
>>>>
>>>> Facet processing is one of the most important features of a search
>>>> server
>>>> (for us) and it seems (at least IMHO) there is no solution for the issue
>>>> since March 2015 (release date for the last SOLR 4 version)
>>>>
>>>> What are the plans / ideas of the solr developers for a possible future
>>>> solution? Or maybe there is already a solution I haven't seen so far.
>>>>
>>>> Thanks for a feedback
>>>>
>>>> Günter
>>>>
>>>>
>>>>
>>>> On 21.08.2017 15:35, guenterh.lists@bluewin.ch wrote:
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> I can't figure out the reason why the facet processing in version 6
>>>>> needs
>>>>> significantly more time compared to version 4.
>>>>>
>>>>> The debugging response (for 30 million documents)
>>>>>
>>>>> solr 4
>>>>> <lst name="process"><double name="time">280.0</double><lst
>>>>> name="query"><double name="time">0.0</double></lst><lst
>>>>> name="facet"><double
>>>>> name="time">280.0</double></lst>
>>>>> (once the query is cached)
>>>>> before caching: between 1.5 and 2 sec
>>>>>
>>>>>
>>>>> solr 6.x (my last try was with 6.6)
>>>>> without docvalues for facetting fields (same schema as version 4)
>>>>> <lst name="process"><double name="time">5874.0</double><lst
>>>>> name="query"><double name="time">0.0</double></lst><lst
>>>>> name="facet"><double
>>>>> name="time">5873.0</double></lst><lst name="facet_module"><double
>>>>> name="time">0.0</double></lst>
>>>>> the time is not getting better even after repeating the query several
>>>>> times
>>>>>
>>>>>
>>>>> solr 6.6 with docvalues for facetting fields
>>>>> <lst name="process"><double name="time">9837.0</double><lst
>>>>> name="query"><double name="time">0.0</double></lst><lst
>>>>> name="facet"><double
>>>>> name="time">9837.0</double></lst><lst name="facet_module"><double
>>>>> name="time">0.0</double></lst>
>>>>>
>>>>> used query (our productive system with version 4)
>>>>>
>>>>>
>>>>> http://search.swissbib.ch/solr/sb-biblio/select?debugQuery=true&q=*:*&facet=true&facet.field=union&facet.field=navAuthor_full&facet.field=format&facet.field=language&facet.field=navSub_green&facet.field=navSubform&facet.field=publishDate&qt=edismax&ps=2&json.nl=arrarr&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&start=0&q.op=AND&sort=score+desc&rows=0&hl.simple.pre={{{{START_HILITE}}}}&facet.limit=100&hl.simple.post={{{{END_HILITE}}}}&spellcheck=false&qf=title_short^1000+title_alt^200+title_sub^200+title_old^200+title_new^200+author^750+author_additional^100+author_additional_dsv11_txt_mv^100+title_additional_dsv11_txt_mv^100+series^200+topic^500+addfields_txt_mv^50+publplace_txt_mv^25+publplace_dsv11_txt_mv^25+fulltext+callnumber^1000+ctrlnum^1000+publishDate+isbn+variant_isbn_isn_mv+issn+localcode+id&pf=title_short^1000&facet.mincount=1&hl.fl=fulltext&&wt=xml&facet.sort=count
>>>>>
>>>>>
>>>>> Running the queries on smaller indices (8 million docs) the difference
>>>>> is
>>>>> similar although the absolut figures for processing time are smaller
>>>>>
>>>>>
>>>>> Any hints why this huge differences?
>>>>>
>>>>> Günter
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Universität Basel
>>>> Universitätsbibliothek
>>>> Günter Hipler
>>>> Projekt SwissBib
>>>> Schoenbeinstrasse 18-20
>>>> 4056 Basel, Schweiz
>>>> Tel.: + 41 (0)61 267 31 12 Fax: ++41 61 267 3103
>>>> E-Mail guenter.hipler@unibas.ch
>>>> URL: www.swissbib.org  / http://www.ub.unibas.ch/
>>>>
>>
>
> --
> Ere Maijala
> Kansalliskirjasto / The National Library of Finland

Re: slow solr facet processing

Posted by Yonik Seeley <ys...@gmail.com>.
On Fri, Sep 1, 2017 at 9:17 AM, Ere Maijala <er...@helsinki.fi> wrote:
> I spoke a bit too soon. Now I see why I didn't see any improvement from
> facet.method=uif before: its performance seems to depend heavily on how many
> facets are returned. With an index of 6 million records and the facet having
> 1960 buckets:
>
> facet.limit=20 takes 4ms
> facet.limit=200 takes ~100ms
> facet.limit=2000 takes ~1300ms
>
> So, for some uses it provides a nice boost, but if you need to fetch more
> than a few top items, it doesn't perform properly.

Yes, this should be the same performance tradeoff that 4.x had.  It's
optimized for retrieving the top N values, where N is small and most
of the time is finding the top ordinals.
To save memory, we don't load all string values into memory.  This
makes ord->string conversion more costly.

-Yonik



> Query used was:
>
> q=*:*&rows=0&facet=true&facet.field=building&facet.mincount=1&facet.limit=2000&debugQuery=true&facet.method=uif
>
> --Ere
>
>
> Ere Maijala kirjoitti 1.9.2017 klo 13.10:
>>
>> I can confirm that we're seeing the same issue as Günter. For a collection
>> of 57 million bibliographic records, Solr 4.10.2 (without docValues) can
>> consistently return a facet in about 20ms, while Solr 6.6.0 with docValues
>> takes around 2600ms. I've tested some versions between those two too, but I
>> don't have comparable numbers for them.
>>
>> I thought I had tried all different combinations of docValues="true/false"
>> and facet.method=fc/uif/enum, but now that I checked it again, it seems that
>> I may have missed a test, as an 6.6.0 index with docValues="false" and
>> facet.method=uif is markedly faster than other methods. At around 700ms it's
>> still not nowhere near as fast as 4.10.2, but a whole lot better. It seems
>> that docValues needs to be disabled for facet.method=uif to have effect
>> though, which is unfortunate. Otherwise it reports that applied method is
>> UIF, but the performance is actually much worse than with FC. I'll do just
>> another round of testing to verify all this. I can report to SOLR-8096 when
>> I have something conclusive.
>>
>> --Ere
>>
>> Yonik Seeley kirjoitti 31.8.2017 klo 20.04:
>>>
>>> A possible improvement for some multiValued fields might be to use the
>>> "uif" facet method (UnInvertedField was the default method for
>>> multiValued fields in 4.x)
>>> I'm not sure if you would need to reindex without docValues on that
>>> field to try it though.
>>>
>>> Example: to enable on the "union" field, add f.union.facet.method=uif
>>>
>>> Support for this was added in
>>> https://issues.apache.org/jira/browse/SOLR-8466
>>>
>>> -Yonik
>>>
>>>
>>> On Thu, Aug 31, 2017 at 10:41 AM, Günter Hipler
>>> <gu...@unibas.ch> wrote:
>>>>
>>>> Hi,
>>>>
>>>> in the meantime I came across the reason for the slow facet processing
>>>> capacities of SOLR since version 5.x
>>>>
>>>>   https://issues.apache.org/jira/browse/SOLR-8096
>>>> https://issues.apache.org/jira/browse/LUCENE-5666
>>>>
>>>> compared to version 4.x
>>>>
>>>> Various library networks across the world are suffering from the same
>>>> symptoms:
>>>>
>>>> Facet processing is one of the most important features of a search
>>>> server
>>>> (for us) and it seems (at least IMHO) there is no solution for the issue
>>>> since March 2015 (release date for the last SOLR 4 version)
>>>>
>>>> What are the plans / ideas of the solr developers for a possible future
>>>> solution? Or maybe there is already a solution I haven't seen so far.
>>>>
>>>> Thanks for a feedback
>>>>
>>>> Günter
>>>>
>>>>
>>>>
>>>> On 21.08.2017 15:35, guenterh.lists@bluewin.ch wrote:
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> I can't figure out the reason why the facet processing in version 6
>>>>> needs
>>>>> significantly more time compared to version 4.
>>>>>
>>>>> The debugging response (for 30 million documents)
>>>>>
>>>>> solr 4
>>>>> <lst name="process"><double name="time">280.0</double><lst
>>>>> name="query"><double name="time">0.0</double></lst><lst
>>>>> name="facet"><double
>>>>> name="time">280.0</double></lst>
>>>>> (once the query is cached)
>>>>> before caching: between 1.5 and 2 sec
>>>>>
>>>>>
>>>>> solr 6.x (my last try was with 6.6)
>>>>> without docvalues for facetting fields (same schema as version 4)
>>>>> <lst name="process"><double name="time">5874.0</double><lst
>>>>> name="query"><double name="time">0.0</double></lst><lst
>>>>> name="facet"><double
>>>>> name="time">5873.0</double></lst><lst name="facet_module"><double
>>>>> name="time">0.0</double></lst>
>>>>> the time is not getting better even after repeating the query several
>>>>> times
>>>>>
>>>>>
>>>>> solr 6.6 with docvalues for facetting fields
>>>>> <lst name="process"><double name="time">9837.0</double><lst
>>>>> name="query"><double name="time">0.0</double></lst><lst
>>>>> name="facet"><double
>>>>> name="time">9837.0</double></lst><lst name="facet_module"><double
>>>>> name="time">0.0</double></lst>
>>>>>
>>>>> used query (our productive system with version 4)
>>>>>
>>>>>
>>>>> http://search.swissbib.ch/solr/sb-biblio/select?debugQuery=true&q=*:*&facet=true&facet.field=union&facet.field=navAuthor_full&facet.field=format&facet.field=language&facet.field=navSub_green&facet.field=navSubform&facet.field=publishDate&qt=edismax&ps=2&json.nl=arrarr&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&start=0&q.op=AND&sort=score+desc&rows=0&hl.simple.pre={{{{START_HILITE}}}}&facet.limit=100&hl.simple.post={{{{END_HILITE}}}}&spellcheck=false&qf=title_short^1000+title_alt^200+title_sub^200+title_old^200+title_new^200+author^750+author_additional^100+author_additional_dsv11_txt_mv^100+title_additional_dsv11_txt_mv^100+series^200+topic^500+addfields_txt_mv^50+publplace_txt_mv^25+publplace_dsv11_txt_mv^25+fulltext+callnumber^1000+ctrlnum^1000+publishDate+isbn+variant_isbn_isn_mv+issn+localcode+id&pf=title_short^1000&facet.mincount=1&hl.fl=fulltext&&wt=xml&facet.sort=count
>>>>>
>>>>>
>>>>> Running the queries on smaller indices (8 million docs) the difference
>>>>> is
>>>>> similar although the absolut figures for processing time are smaller
>>>>>
>>>>>
>>>>> Any hints why this huge differences?
>>>>>
>>>>> Günter
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Universität Basel
>>>> Universitätsbibliothek
>>>> Günter Hipler
>>>> Projekt SwissBib
>>>> Schoenbeinstrasse 18-20
>>>> 4056 Basel, Schweiz
>>>> Tel.: + 41 (0)61 267 31 12 Fax: ++41 61 267 3103
>>>> E-Mail guenter.hipler@unibas.ch
>>>> URL: www.swissbib.org  / http://www.ub.unibas.ch/
>>>>
>>
>
> --
> Ere Maijala
> Kansalliskirjasto / The National Library of Finland

Re: slow solr facet processing

Posted by Ere Maijala <er...@helsinki.fi>.
I spoke a bit too soon. Now I see why I didn't see any improvement from 
facet.method=uif before: its performance seems to depend heavily on how 
many facets are returned. With an index of 6 million records and the 
facet having 1960 buckets:

facet.limit=20 takes 4ms
facet.limit=200 takes ~100ms
facet.limit=2000 takes ~1300ms

So, for some uses it provides a nice boost, but if you need to fetch 
more than a few top items, it doesn't perform properly.

Query used was:

q=*:*&rows=0&facet=true&facet.field=building&facet.mincount=1&facet.limit=2000&debugQuery=true&facet.method=uif

--Ere

Ere Maijala kirjoitti 1.9.2017 klo 13.10:
> I can confirm that we're seeing the same issue as Günter. For a 
> collection of 57 million bibliographic records, Solr 4.10.2 (without 
> docValues) can consistently return a facet in about 20ms, while Solr 
> 6.6.0 with docValues takes around 2600ms. I've tested some versions 
> between those two too, but I don't have comparable numbers for them.
> 
> I thought I had tried all different combinations of 
> docValues="true/false" and facet.method=fc/uif/enum, but now that I 
> checked it again, it seems that I may have missed a test, as an 6.6.0 
> index with docValues="false" and facet.method=uif is markedly faster 
> than other methods. At around 700ms it's still not nowhere near as fast 
> as 4.10.2, but a whole lot better. It seems that docValues needs to be 
> disabled for facet.method=uif to have effect though, which is 
> unfortunate. Otherwise it reports that applied method is UIF, but the 
> performance is actually much worse than with FC. I'll do just another 
> round of testing to verify all this. I can report to SOLR-8096 when I 
> have something conclusive.
> 
> --Ere
> 
> Yonik Seeley kirjoitti 31.8.2017 klo 20.04:
>> A possible improvement for some multiValued fields might be to use the
>> "uif" facet method (UnInvertedField was the default method for
>> multiValued fields in 4.x)
>> I'm not sure if you would need to reindex without docValues on that
>> field to try it though.
>>
>> Example: to enable on the "union" field, add f.union.facet.method=uif
>>
>> Support for this was added in 
>> https://issues.apache.org/jira/browse/SOLR-8466
>>
>> -Yonik
>>
>>
>> On Thu, Aug 31, 2017 at 10:41 AM, Günter Hipler
>> <gu...@unibas.ch> wrote:
>>> Hi,
>>>
>>> in the meantime I came across the reason for the slow facet processing
>>> capacities of SOLR since version 5.x
>>>
>>>   https://issues.apache.org/jira/browse/SOLR-8096
>>> https://issues.apache.org/jira/browse/LUCENE-5666
>>>
>>> compared to version 4.x
>>>
>>> Various library networks across the world are suffering from the same
>>> symptoms:
>>>
>>> Facet processing is one of the most important features of a search 
>>> server
>>> (for us) and it seems (at least IMHO) there is no solution for the issue
>>> since March 2015 (release date for the last SOLR 4 version)
>>>
>>> What are the plans / ideas of the solr developers for a possible future
>>> solution? Or maybe there is already a solution I haven't seen so far.
>>>
>>> Thanks for a feedback
>>>
>>> Günter
>>>
>>>
>>>
>>> On 21.08.2017 15:35, guenterh.lists@bluewin.ch wrote:
>>>>
>>>> Hi,
>>>>
>>>> I can't figure out the reason why the facet processing in version 6 
>>>> needs
>>>> significantly more time compared to version 4.
>>>>
>>>> The debugging response (for 30 million documents)
>>>>
>>>> solr 4
>>>> <lst name="process"><double name="time">280.0</double><lst
>>>> name="query"><double name="time">0.0</double></lst><lst 
>>>> name="facet"><double
>>>> name="time">280.0</double></lst>
>>>> (once the query is cached)
>>>> before caching: between 1.5 and 2 sec
>>>>
>>>>
>>>> solr 6.x (my last try was with 6.6)
>>>> without docvalues for facetting fields (same schema as version 4)
>>>> <lst name="process"><double name="time">5874.0</double><lst
>>>> name="query"><double name="time">0.0</double></lst><lst 
>>>> name="facet"><double
>>>> name="time">5873.0</double></lst><lst name="facet_module"><double
>>>> name="time">0.0</double></lst>
>>>> the time is not getting better even after repeating the query several
>>>> times
>>>>
>>>>
>>>> solr 6.6 with docvalues for facetting fields
>>>> <lst name="process"><double name="time">9837.0</double><lst
>>>> name="query"><double name="time">0.0</double></lst><lst 
>>>> name="facet"><double
>>>> name="time">9837.0</double></lst><lst name="facet_module"><double
>>>> name="time">0.0</double></lst>
>>>>
>>>> used query (our productive system with version 4)
>>>>
>>>> http://search.swissbib.ch/solr/sb-biblio/select?debugQuery=true&q=*:*&facet=true&facet.field=union&facet.field=navAuthor_full&facet.field=format&facet.field=language&facet.field=navSub_green&facet.field=navSubform&facet.field=publishDate&qt=edismax&ps=2&json.nl=arrarr&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&start=0&q.op=AND&sort=score+desc&rows=0&hl.simple.pre={{{{START_HILITE}}}}&facet.limit=100&hl.simple.post={{{{END_HILITE}}}}&spellcheck=false&qf=title_short^1000+title_alt^200+title_sub^200+title_old^200+title_new^200+author^750+author_additional^100+author_additional_dsv11_txt_mv^100+title_additional_dsv11_txt_mv^100+series^200+topic^500+addfields_txt_mv^50+publplace_txt_mv^25+publplace_dsv11_txt_mv^25+fulltext+callnumber^1000+ctrlnum^1000+publishDate+isbn+variant_isbn_isn_mv+issn+localcode+id&pf=title_short^1000&facet.mincount=1&hl.fl=fulltext&&wt=xml&facet.sort=count 
>>>>
>>>>
>>>>
>>>> Running the queries on smaller indices (8 million docs) the 
>>>> difference is
>>>> similar although the absolut figures for processing time are smaller
>>>>
>>>>
>>>> Any hints why this huge differences?
>>>>
>>>> Günter
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>> -- 
>>> Universität Basel
>>> Universitätsbibliothek
>>> Günter Hipler
>>> Projekt SwissBib
>>> Schoenbeinstrasse 18-20
>>> 4056 Basel, Schweiz
>>> Tel.: + 41 (0)61 267 31 12 Fax: ++41 61 267 3103
>>> E-Mail guenter.hipler@unibas.ch
>>> URL: www.swissbib.org  / http://www.ub.unibas.ch/
>>>
> 

-- 
Ere Maijala
Kansalliskirjasto / The National Library of Finland

Re: slow solr facet processing

Posted by Ere Maijala <er...@helsinki.fi>.
I can confirm that we're seeing the same issue as Günter. For a 
collection of 57 million bibliographic records, Solr 4.10.2 (without 
docValues) can consistently return a facet in about 20ms, while Solr 
6.6.0 with docValues takes around 2600ms. I've tested some versions 
between those two too, but I don't have comparable numbers for them.

I thought I had tried all different combinations of 
docValues="true/false" and facet.method=fc/uif/enum, but now that I 
checked it again, it seems that I may have missed a test, as an 6.6.0 
index with docValues="false" and facet.method=uif is markedly faster 
than other methods. At around 700ms it's still not nowhere near as fast 
as 4.10.2, but a whole lot better. It seems that docValues needs to be 
disabled for facet.method=uif to have effect though, which is 
unfortunate. Otherwise it reports that applied method is UIF, but the 
performance is actually much worse than with FC. I'll do just another 
round of testing to verify all this. I can report to SOLR-8096 when I 
have something conclusive.

--Ere

Yonik Seeley kirjoitti 31.8.2017 klo 20.04:
> A possible improvement for some multiValued fields might be to use the
> "uif" facet method (UnInvertedField was the default method for
> multiValued fields in 4.x)
> I'm not sure if you would need to reindex without docValues on that
> field to try it though.
> 
> Example: to enable on the "union" field, add f.union.facet.method=uif
> 
> Support for this was added in https://issues.apache.org/jira/browse/SOLR-8466
> 
> -Yonik
> 
> 
> On Thu, Aug 31, 2017 at 10:41 AM, Günter Hipler
> <gu...@unibas.ch> wrote:
>> Hi,
>>
>> in the meantime I came across the reason for the slow facet processing
>> capacities of SOLR since version 5.x
>>
>>   https://issues.apache.org/jira/browse/SOLR-8096
>> https://issues.apache.org/jira/browse/LUCENE-5666
>>
>> compared to version 4.x
>>
>> Various library networks across the world are suffering from the same
>> symptoms:
>>
>> Facet processing is one of the most important features of a search server
>> (for us) and it seems (at least IMHO) there is no solution for the issue
>> since March 2015 (release date for the last SOLR 4 version)
>>
>> What are the plans / ideas of the solr developers for a possible future
>> solution? Or maybe there is already a solution I haven't seen so far.
>>
>> Thanks for a feedback
>>
>> Günter
>>
>>
>>
>> On 21.08.2017 15:35, guenterh.lists@bluewin.ch wrote:
>>>
>>> Hi,
>>>
>>> I can't figure out the reason why the facet processing in version 6 needs
>>> significantly more time compared to version 4.
>>>
>>> The debugging response (for 30 million documents)
>>>
>>> solr 4
>>> <lst name="process"><double name="time">280.0</double><lst
>>> name="query"><double name="time">0.0</double></lst><lst name="facet"><double
>>> name="time">280.0</double></lst>
>>> (once the query is cached)
>>> before caching: between 1.5 and 2 sec
>>>
>>>
>>> solr 6.x (my last try was with 6.6)
>>> without docvalues for facetting fields (same schema as version 4)
>>> <lst name="process"><double name="time">5874.0</double><lst
>>> name="query"><double name="time">0.0</double></lst><lst name="facet"><double
>>> name="time">5873.0</double></lst><lst name="facet_module"><double
>>> name="time">0.0</double></lst>
>>> the time is not getting better even after repeating the query several
>>> times
>>>
>>>
>>> solr 6.6 with docvalues for facetting fields
>>> <lst name="process"><double name="time">9837.0</double><lst
>>> name="query"><double name="time">0.0</double></lst><lst name="facet"><double
>>> name="time">9837.0</double></lst><lst name="facet_module"><double
>>> name="time">0.0</double></lst>
>>>
>>> used query (our productive system with version 4)
>>>
>>> http://search.swissbib.ch/solr/sb-biblio/select?debugQuery=true&q=*:*&facet=true&facet.field=union&facet.field=navAuthor_full&facet.field=format&facet.field=language&facet.field=navSub_green&facet.field=navSubform&facet.field=publishDate&qt=edismax&ps=2&json.nl=arrarr&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&start=0&q.op=AND&sort=score+desc&rows=0&hl.simple.pre={{{{START_HILITE}}}}&facet.limit=100&hl.simple.post={{{{END_HILITE}}}}&spellcheck=false&qf=title_short^1000+title_alt^200+title_sub^200+title_old^200+title_new^200+author^750+author_additional^100+author_additional_dsv11_txt_mv^100+title_additional_dsv11_txt_mv^100+series^200+topic^500+addfields_txt_mv^50+publplace_txt_mv^25+publplace_dsv11_txt_mv^25+fulltext+callnumber^1000+ctrlnum^1000+publishDate+isbn+variant_isbn_isn_mv+issn+localcode+id&pf=title_short^1000&facet.mincount=1&hl.fl=fulltext&&wt=xml&facet.sort=count
>>>
>>>
>>> Running the queries on smaller indices (8 million docs) the difference is
>>> similar although the absolut figures for processing time are smaller
>>>
>>>
>>> Any hints why this huge differences?
>>>
>>> Günter
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>> --
>> Universität Basel
>> Universitätsbibliothek
>> Günter Hipler
>> Projekt SwissBib
>> Schoenbeinstrasse 18-20
>> 4056 Basel, Schweiz
>> Tel.: + 41 (0)61 267 31 12 Fax: ++41 61 267 3103
>> E-Mail guenter.hipler@unibas.ch
>> URL: www.swissbib.org  / http://www.ub.unibas.ch/
>>

-- 
Ere Maijala
Kansalliskirjasto / The National Library of Finland

Re: slow solr facet processing

Posted by Yonik Seeley <ys...@gmail.com>.
A possible improvement for some multiValued fields might be to use the
"uif" facet method (UnInvertedField was the default method for
multiValued fields in 4.x)
I'm not sure if you would need to reindex without docValues on that
field to try it though.

Example: to enable on the "union" field, add f.union.facet.method=uif

Support for this was added in https://issues.apache.org/jira/browse/SOLR-8466

-Yonik


On Thu, Aug 31, 2017 at 10:41 AM, Günter Hipler
<gu...@unibas.ch> wrote:
> Hi,
>
> in the meantime I came across the reason for the slow facet processing
> capacities of SOLR since version 5.x
>
>  https://issues.apache.org/jira/browse/SOLR-8096
> https://issues.apache.org/jira/browse/LUCENE-5666
>
> compared to version 4.x
>
> Various library networks across the world are suffering from the same
> symptoms:
>
> Facet processing is one of the most important features of a search server
> (for us) and it seems (at least IMHO) there is no solution for the issue
> since March 2015 (release date for the last SOLR 4 version)
>
> What are the plans / ideas of the solr developers for a possible future
> solution? Or maybe there is already a solution I haven't seen so far.
>
> Thanks for a feedback
>
> Günter
>
>
>
> On 21.08.2017 15:35, guenterh.lists@bluewin.ch wrote:
>>
>> Hi,
>>
>> I can't figure out the reason why the facet processing in version 6 needs
>> significantly more time compared to version 4.
>>
>> The debugging response (for 30 million documents)
>>
>> solr 4
>> <lst name="process"><double name="time">280.0</double><lst
>> name="query"><double name="time">0.0</double></lst><lst name="facet"><double
>> name="time">280.0</double></lst>
>> (once the query is cached)
>> before caching: between 1.5 and 2 sec
>>
>>
>> solr 6.x (my last try was with 6.6)
>> without docvalues for facetting fields (same schema as version 4)
>> <lst name="process"><double name="time">5874.0</double><lst
>> name="query"><double name="time">0.0</double></lst><lst name="facet"><double
>> name="time">5873.0</double></lst><lst name="facet_module"><double
>> name="time">0.0</double></lst>
>> the time is not getting better even after repeating the query several
>> times
>>
>>
>> solr 6.6 with docvalues for facetting fields
>> <lst name="process"><double name="time">9837.0</double><lst
>> name="query"><double name="time">0.0</double></lst><lst name="facet"><double
>> name="time">9837.0</double></lst><lst name="facet_module"><double
>> name="time">0.0</double></lst>
>>
>> used query (our productive system with version 4)
>>
>> http://search.swissbib.ch/solr/sb-biblio/select?debugQuery=true&q=*:*&facet=true&facet.field=union&facet.field=navAuthor_full&facet.field=format&facet.field=language&facet.field=navSub_green&facet.field=navSubform&facet.field=publishDate&qt=edismax&ps=2&json.nl=arrarr&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&start=0&q.op=AND&sort=score+desc&rows=0&hl.simple.pre={{{{START_HILITE}}}}&facet.limit=100&hl.simple.post={{{{END_HILITE}}}}&spellcheck=false&qf=title_short^1000+title_alt^200+title_sub^200+title_old^200+title_new^200+author^750+author_additional^100+author_additional_dsv11_txt_mv^100+title_additional_dsv11_txt_mv^100+series^200+topic^500+addfields_txt_mv^50+publplace_txt_mv^25+publplace_dsv11_txt_mv^25+fulltext+callnumber^1000+ctrlnum^1000+publishDate+isbn+variant_isbn_isn_mv+issn+localcode+id&pf=title_short^1000&facet.mincount=1&hl.fl=fulltext&&wt=xml&facet.sort=count
>>
>>
>> Running the queries on smaller indices (8 million docs) the difference is
>> similar although the absolut figures for processing time are smaller
>>
>>
>> Any hints why this huge differences?
>>
>> Günter
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
> --
> Universität Basel
> Universitätsbibliothek
> Günter Hipler
> Projekt SwissBib
> Schoenbeinstrasse 18-20
> 4056 Basel, Schweiz
> Tel.: + 41 (0)61 267 31 12 Fax: ++41 61 267 3103
> E-Mail guenter.hipler@unibas.ch
> URL: www.swissbib.org  / http://www.ub.unibas.ch/
>

slow solr facet processing

Posted by Günter Hipler <gu...@unibas.ch>.
Hi,

in the meantime I came across the reason for the slow facet processing 
capacities of SOLR since version 5.x

  https://issues.apache.org/jira/browse/SOLR-8096
https://issues.apache.org/jira/browse/LUCENE-5666

compared to version 4.x

Various library networks across the world are suffering from the same 
symptoms:

Facet processing is one of the most important features of a search 
server (for us) and it seems (at least IMHO) there is no solution for 
the issue since March 2015 (release date for the last SOLR 4 version)

What are the plans / ideas of the solr developers for a possible future 
solution? Or maybe there is already a solution I haven't seen so far.

Thanks for a feedback

Günter



On 21.08.2017 15:35, guenterh.lists@bluewin.ch wrote:
> Hi,
>
> I can't figure out the reason why the facet processing in version 6 
> needs significantly more time compared to version 4.
>
> The debugging response (for 30 million documents)
>
> solr 4
> <lst name="process"><double name="time">280.0</double><lst 
> name="query"><double name="time">0.0</double></lst><lst 
> name="facet"><double name="time">280.0</double></lst>
> (once the query is cached)
> before caching: between 1.5 and 2 sec
>
>
> solr 6.x (my last try was with 6.6)
> without docvalues for facetting fields (same schema as version 4)
> <lst name="process"><double name="time">5874.0</double><lst 
> name="query"><double name="time">0.0</double></lst><lst 
> name="facet"><double name="time">5873.0</double></lst><lst 
> name="facet_module"><double name="time">0.0</double></lst>
> the time is not getting better even after repeating the query several 
> times
>
>
> solr 6.6 with docvalues for facetting fields
> <lst name="process"><double name="time">9837.0</double><lst 
> name="query"><double name="time">0.0</double></lst><lst 
> name="facet"><double name="time">9837.0</double></lst><lst 
> name="facet_module"><double name="time">0.0</double></lst>
>
> used query (our productive system with version 4)
> http://search.swissbib.ch/solr/sb-biblio/select?debugQuery=true&q=*:*&facet=true&facet.field=union&facet.field=navAuthor_full&facet.field=format&facet.field=language&facet.field=navSub_green&facet.field=navSubform&facet.field=publishDate&qt=edismax&ps=2&json.nl=arrarr&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&start=0&q.op=AND&sort=score+desc&rows=0&hl.simple.pre={{{{START_HILITE}}}}&facet.limit=100&hl.simple.post={{{{END_HILITE}}}}&spellcheck=false&qf=title_short^1000+title_alt^200+title_sub^200+title_old^200+title_new^200+author^750+author_additional^100+author_additional_dsv11_txt_mv^100+title_additional_dsv11_txt_mv^100+series^200+topic^500+addfields_txt_mv^50+publplace_txt_mv^25+publplace_dsv11_txt_mv^25+fulltext+callnumber^1000+ctrlnum^1000+publishDate+isbn+variant_isbn_isn_mv+issn+localcode+id&pf=title_short^1000&facet.mincount=1&hl.fl=fulltext&&wt=xml&facet.sort=count
>
>
> Running the queries on smaller indices (8 million docs) the difference 
> is similar although the absolut figures for processing time are smaller
>
>
> Any hints why this huge differences?
>
> Günter
>
>
>
>
>
>
>
>
>

-- 
Universität Basel
Universitätsbibliothek
Günter Hipler
Projekt SwissBib
Schoenbeinstrasse 18-20
4056 Basel, Schweiz
Tel.: + 41 (0)61 267 31 12 Fax: ++41 61 267 3103
E-Mail guenter.hipler@unibas.ch
URL: www.swissbib.org  / http://www.ub.unibas.ch/