You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Johannes Goll <jo...@gmail.com> on 2011/04/05 21:58:42 UTC

apache-solr-3.1 slow stats component queries

Hi,

thank you for making the new apache-solr-3.1 available.

I have installed the version from

http://apache.tradebit.com/pub//lucene/solr/3.1.0/

and am running into very slow stats component queries (~ 1 minute)
for fetching the computed sum of the stats field

url: ?q=*:*&start=0&rows=0&stats=true&stats.field=weight

<int name="QTime">52825</int>

#documents:     78,359,699
total RAM:         256G
vm arguments:  -server -xmx40G

the stats.field specification is as follows:
<field name="weight"                type="pfloat"    indexed="true"
stored="false"     required="true"     multiValued="false"
default="1"/>

filter queries that narrow down the #docs help to reduce it -
QTime seems to be proportional to the number of docs being returned
by a filter query.

Is there any way to improve the performance of such stats queries ?
Caching only helped to improve the filter query performance but if
larger subsets are being returned, QTime increases unacceptably.

Since I only need the sum and not the STD or sumsOfSquares/Min/Max,
I have created a custom 3.1 version that does only return the sum. But this
only slightly improved the performance. Of course I could somehow cache
the larger sum queries on the client side but I want to do this only as a
last resort.

Thank you very much in advance for any ideas/suggestions.

Johannes

Re: apache-solr-3.1 slow stats component queries

Posted by Chris Hostetter <ho...@fucit.org>.
: Are there any plans for caching stat results for a certain stat field along
: with the documents that match a filter query ? Any other ideas that could
: help to improve this (hardware/software configuration) ?  Even for a subset
: of 10M entries, the stat search takes on the order of 10 seconds.

I don't know of anyone working on it, and off the top of my head i don't 
remember seeing any jira issues for it, but it certianly seems feasible to 
add optional caching (the StatsComponent could have an optional init param 
naming a user declared cache)

Feel free to open a jira issue.


-Hoss

Re: apache-solr-3.1 slow stats component queries

Posted by Johannes Goll <jo...@gmail.com>.
Hi,

I bench-marked the slow stats queries (6 point estimate) using the same
hardware on an index of size 104M. We use a Solr/Lucene 3.1-mod which
returns only the sum and count for statistics component results. Solr/Lucene
is run on jetty.

The relationship between query time and set of found documents is linear
when using the stats component (R^2 0.99). I guess this is expected as the
application needs to scan/sum-up the stat field for all matching documents?

Are there any plans for caching stat results for a certain stat field along
with the documents that match a filter query ? Any other ideas that could
help to improve this (hardware/software configuration) ?  Even for a subset
of 10M entries, the stat search takes on the order of 10 seconds.

Thanks in advance.
Johannes



2011/4/18 Johannes Goll <jo...@gmail.com>

> any ideas why in this case the stats summaries are so slow  ?  Thank you
> very much in advance for any ideas/suggestions. Johannes
>
>
> 2011/4/5 Johannes Goll <jo...@gmail.com>
>
>> Hi,
>>
>> thank you for making the new apache-solr-3.1 available.
>>
>> I have installed the version from
>>
>> http://apache.tradebit.com/pub//lucene/solr/3.1.0/
>>
>> and am running into very slow stats component queries (~ 1 minute)
>> for fetching the computed sum of the stats field
>>
>> url: ?q=*:*&start=0&rows=0&stats=true&stats.field=weight
>>
>> <int name="QTime">52825</int>
>>
>> #documents:     78,359,699
>> total RAM:         256G
>> vm arguments:  -server -xmx40G
>>
>> the stats.field specification is as follows:
>> <field name="weight"                type="pfloat"    indexed="true"
>> stored="false"     required="true"     multiValued="false"
>> default="1"/>
>>
>> filter queries that narrow down the #docs help to reduce it -
>> QTime seems to be proportional to the number of docs being returned
>> by a filter query.
>>
>> Is there any way to improve the performance of such stats queries ?
>> Caching only helped to improve the filter query performance but if
>> larger subsets are being returned, QTime increases unacceptably.
>>
>> Since I only need the sum and not the STD or sumsOfSquares/Min/Max,
>> I have created a custom 3.1 version that does only return the sum. But
>> this
>> only slightly improved the performance. Of course I could somehow cache
>> the larger sum queries on the client side but I want to do this only as a
>> last resort.
>>
>> Thank you very much in advance for any ideas/suggestions.
>>
>> Johannes
>>
>>
>
>
> --
> Johannes Goll
> 211 Curry Ford Lane
> Gaithersburg, Maryland 20878
>

Re: apache-solr-3.1 slow stats component queries

Posted by Johannes Goll <jo...@gmail.com>.
any ideas why in this case the stats summaries are so slow  ?  Thank you
very much in advance for any ideas/suggestions. Johannes

2011/4/5 Johannes Goll <jo...@gmail.com>

> Hi,
>
> thank you for making the new apache-solr-3.1 available.
>
> I have installed the version from
>
> http://apache.tradebit.com/pub//lucene/solr/3.1.0/
>
> and am running into very slow stats component queries (~ 1 minute)
> for fetching the computed sum of the stats field
>
> url: ?q=*:*&start=0&rows=0&stats=true&stats.field=weight
>
> <int name="QTime">52825</int>
>
> #documents:     78,359,699
> total RAM:         256G
> vm arguments:  -server -xmx40G
>
> the stats.field specification is as follows:
> <field name="weight"                type="pfloat"    indexed="true"
> stored="false"     required="true"     multiValued="false"
> default="1"/>
>
> filter queries that narrow down the #docs help to reduce it -
> QTime seems to be proportional to the number of docs being returned
> by a filter query.
>
> Is there any way to improve the performance of such stats queries ?
> Caching only helped to improve the filter query performance but if
> larger subsets are being returned, QTime increases unacceptably.
>
> Since I only need the sum and not the STD or sumsOfSquares/Min/Max,
> I have created a custom 3.1 version that does only return the sum. But this
> only slightly improved the performance. Of course I could somehow cache
> the larger sum queries on the client side but I want to do this only as a
> last resort.
>
> Thank you very much in advance for any ideas/suggestions.
>
> Johannes
>
>


-- 
Johannes Goll
211 Curry Ford Lane
Gaithersburg, Maryland 20878