You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Johannes Goll <jo...@gmail.com> on 2011/04/05 21:58:42 UTC
apache-solr-3.1 slow stats component queries
Hi,
thank you for making the new apache-solr-3.1 available.
I have installed the version from
http://apache.tradebit.com/pub//lucene/solr/3.1.0/
and am running into very slow stats component queries (~ 1 minute)
for fetching the computed sum of the stats field
url: ?q=*:*&start=0&rows=0&stats=true&stats.field=weight
<int name="QTime">52825</int>
#documents: 78,359,699
total RAM: 256G
vm arguments: -server -xmx40G
the stats.field specification is as follows:
<field name="weight" type="pfloat" indexed="true"
stored="false" required="true" multiValued="false"
default="1"/>
filter queries that narrow down the #docs help to reduce it -
QTime seems to be proportional to the number of docs being returned
by a filter query.
Is there any way to improve the performance of such stats queries ?
Caching only helped to improve the filter query performance but if
larger subsets are being returned, QTime increases unacceptably.
Since I only need the sum and not the STD or sumsOfSquares/Min/Max,
I have created a custom 3.1 version that does only return the sum. But this
only slightly improved the performance. Of course I could somehow cache
the larger sum queries on the client side but I want to do this only as a
last resort.
Thank you very much in advance for any ideas/suggestions.
Johannes
Re: apache-solr-3.1 slow stats component queries
Posted by Chris Hostetter <ho...@fucit.org>.
: Are there any plans for caching stat results for a certain stat field along
: with the documents that match a filter query ? Any other ideas that could
: help to improve this (hardware/software configuration) ? Even for a subset
: of 10M entries, the stat search takes on the order of 10 seconds.
I don't know of anyone working on it, and off the top of my head i don't
remember seeing any jira issues for it, but it certianly seems feasible to
add optional caching (the StatsComponent could have an optional init param
naming a user declared cache)
Feel free to open a jira issue.
-Hoss
Re: apache-solr-3.1 slow stats component queries
Posted by Johannes Goll <jo...@gmail.com>.
Hi,
I bench-marked the slow stats queries (6 point estimate) using the same
hardware on an index of size 104M. We use a Solr/Lucene 3.1-mod which
returns only the sum and count for statistics component results. Solr/Lucene
is run on jetty.
The relationship between query time and set of found documents is linear
when using the stats component (R^2 0.99). I guess this is expected as the
application needs to scan/sum-up the stat field for all matching documents?
Are there any plans for caching stat results for a certain stat field along
with the documents that match a filter query ? Any other ideas that could
help to improve this (hardware/software configuration) ? Even for a subset
of 10M entries, the stat search takes on the order of 10 seconds.
Thanks in advance.
Johannes
2011/4/18 Johannes Goll <jo...@gmail.com>
> any ideas why in this case the stats summaries are so slow ? Thank you
> very much in advance for any ideas/suggestions. Johannes
>
>
> 2011/4/5 Johannes Goll <jo...@gmail.com>
>
>> Hi,
>>
>> thank you for making the new apache-solr-3.1 available.
>>
>> I have installed the version from
>>
>> http://apache.tradebit.com/pub//lucene/solr/3.1.0/
>>
>> and am running into very slow stats component queries (~ 1 minute)
>> for fetching the computed sum of the stats field
>>
>> url: ?q=*:*&start=0&rows=0&stats=true&stats.field=weight
>>
>> <int name="QTime">52825</int>
>>
>> #documents: 78,359,699
>> total RAM: 256G
>> vm arguments: -server -xmx40G
>>
>> the stats.field specification is as follows:
>> <field name="weight" type="pfloat" indexed="true"
>> stored="false" required="true" multiValued="false"
>> default="1"/>
>>
>> filter queries that narrow down the #docs help to reduce it -
>> QTime seems to be proportional to the number of docs being returned
>> by a filter query.
>>
>> Is there any way to improve the performance of such stats queries ?
>> Caching only helped to improve the filter query performance but if
>> larger subsets are being returned, QTime increases unacceptably.
>>
>> Since I only need the sum and not the STD or sumsOfSquares/Min/Max,
>> I have created a custom 3.1 version that does only return the sum. But
>> this
>> only slightly improved the performance. Of course I could somehow cache
>> the larger sum queries on the client side but I want to do this only as a
>> last resort.
>>
>> Thank you very much in advance for any ideas/suggestions.
>>
>> Johannes
>>
>>
>
>
> --
> Johannes Goll
> 211 Curry Ford Lane
> Gaithersburg, Maryland 20878
>
Re: apache-solr-3.1 slow stats component queries
Posted by Johannes Goll <jo...@gmail.com>.
any ideas why in this case the stats summaries are so slow ? Thank you
very much in advance for any ideas/suggestions. Johannes
2011/4/5 Johannes Goll <jo...@gmail.com>
> Hi,
>
> thank you for making the new apache-solr-3.1 available.
>
> I have installed the version from
>
> http://apache.tradebit.com/pub//lucene/solr/3.1.0/
>
> and am running into very slow stats component queries (~ 1 minute)
> for fetching the computed sum of the stats field
>
> url: ?q=*:*&start=0&rows=0&stats=true&stats.field=weight
>
> <int name="QTime">52825</int>
>
> #documents: 78,359,699
> total RAM: 256G
> vm arguments: -server -xmx40G
>
> the stats.field specification is as follows:
> <field name="weight" type="pfloat" indexed="true"
> stored="false" required="true" multiValued="false"
> default="1"/>
>
> filter queries that narrow down the #docs help to reduce it -
> QTime seems to be proportional to the number of docs being returned
> by a filter query.
>
> Is there any way to improve the performance of such stats queries ?
> Caching only helped to improve the filter query performance but if
> larger subsets are being returned, QTime increases unacceptably.
>
> Since I only need the sum and not the STD or sumsOfSquares/Min/Max,
> I have created a custom 3.1 version that does only return the sum. But this
> only slightly improved the performance. Of course I could somehow cache
> the larger sum queries on the client side but I want to do this only as a
> last resort.
>
> Thank you very much in advance for any ideas/suggestions.
>
> Johannes
>
>
--
Johannes Goll
211 Curry Ford Lane
Gaithersburg, Maryland 20878