You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Agnieszka Kukałowicz <ag...@usable.pl> on 2012/07/16 13:00:01 UTC

Grouping performance problem

Hi,

Is the any way to make grouping searches more efficient?

My queries look like:
/select?q=query&group=true&group.field=id&group.facet=true&group.ngroups=true&facet.field=category1&facet.missing=false&facet.mincount=1

For index with 3 mln documents query for all docs with group=true takes
almost 4000ms. Because queryResultCache is not used next queries take a
long time also.

When I remove group=true and leave only faceting the query for all docs
takes much more less time: for first time ~ 700ms and next runs only 200ms
because of queryResultCache being used.

So with group=true the query is about 20 time slower than without it.
Is it possible or is there any way to improve performance with grouping?

My application needs grouping feature and all of the queries use it but the
performance of them is to low for production use.

I use Solr 4.x from trunk

Agnieszka Kukalowicz

Re: Grouping performance problem

Posted by arres <he...@gmail.com>.
Hello there, 
I am faceing the same problem. 
Did anyone found a solution yet?
Thank you,
arres



--
View this message in context: http://lucene.472066.n3.nabble.com/Grouping-performance-problem-tp3995245p4138419.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Grouping performance problem

Posted by shamik <sh...@gmail.com>.
Bumping up this thread as I'm facing similar issue . Any solution ?



--
View this message in context: http://lucene.472066.n3.nabble.com/Grouping-performance-problem-tp3995245p4098566.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Grouping performance problem

Posted by davidduffett <da...@espares.co.uk>.
Agnieszka,

Did you find a good solution to your performance problem with grouping?  I
have an index with 45m records and am using grouping and the performance is
atrocious.

Any advice would be very welcome!

Thanks in advance,
David



--
View this message in context: http://lucene.472066.n3.nabble.com/Grouping-performance-problem-tp3995245p4056113.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Grouping performance problem

Posted by Agnieszka Kukałowicz <ag...@usable.pl>.
Hi,

I made some more tests to find what exaclty slows the queries.
During debugging queries I found that queries using group.facet=true are
much more slower than queries without it.
For example:

query with group.facet=true:
<lst name="process">
<double name="time">4524.0</double>
<lst name="org.apache.solr.handler.component.SpellCheckComponent">
<double name="time">1.0</double>
</lst>
<lst name="org.apache.solr.handler.component.QueryComponent">
<double name="time">878.0</double>
</lst>
<lst name="org.apache.solr.handler.component.FacetComponent">
<double name="time">3449.0</double>
</lst>

query without group.facet=true:
<lst name="process"><
double name="time">1409.0</double>
<lst name="org.apache.solr.handler.component.SpellCheckComponent">
<double name="time">1.0</double>
</lst>
<lst name="org.apache.solr.handler.component.QueryComponent">
<double name="time">1075.0</double>
</lst>
<lst name="org.apache.solr.handler.component.FacetComponent">
<double name="time">134.0</double>
</lst>

The diffrence is in FacetComponent. With group.facet=true this component is
25 times slower.
My application needs counting facet on groups not on documents but with
times as above the group.facet is not very useful for me.
Is this a temporary situation and are there any works on improvement this?

Best
Agnieszka




2012/7/16 <al...@aim.com>

> This is strange. We have data folder size 24Gb,  RAM for java 2GB. We
> query with grouping, ngroups and  highlighting, do not query all fields and
> query time mostly is less than 1 sec it rarely goes up to 2 sec. We use
> solr 3.6 and tuned off all kind of caching.
> Maybe your problem is with caching and displaying all fields?
>
> Hope this may help.
>
> Alex.
>
>
>
> -----Original Message-----
> From: Agnieszka Kukałowicz <ag...@usable.pl>
> To: solr-user <so...@lucene.apache.org>
> Sent: Mon, Jul 16, 2012 10:04 am
> Subject: Re: Grouping performance problem
>
>
> I have server with 24GB RAM. I have 4 shards on it, each of them with 4GB
> RAM for java:
> JAVA_OPTIONS="-server -Xms4096M -Xmx4096M"
> The size is about 15GB for one shard (i use ssd disk for index data).
>
> Agnieszka
>
>
> 2012/7/16 <al...@aim.com>
>
> > What are the RAM of your server and size of the data folder?
> >
> >
> >
> > -----Original Message-----
> > From: Agnieszka Kukałowicz <ag...@usable.pl>
> > To: solr-user <so...@lucene.apache.org>
> > Sent: Mon, Jul 16, 2012 6:16 am
> > Subject: Re: Grouping performance problem
> >
> >
> > Hi Pavel,
> >
> > I tried with group.ngroups=false but didn't notice a big improvement.
> > The times were still about 4000 ms. It doesn't solve my problem.
> > Maybe this is because of my index type. I have millions of documents but
> > only about 20 000 groups.
> >
> >  Cheers
> >  Agnieszka
> >
> > 2012/7/16 Pavel Goncharik <pa...@gmail.com>
> >
> > > Hi Agnieszka ,
> > >
> > > if you don't need number of groups, you can try leaving out
> > > group.ngroups=true param.
> > > In this case Solr apparently skips calculating all groups and delivers
> > > results much faster.
> > > At least for our application the difference in performance
> > > with/without group.ngroups=true is significant (have to say, we use
> > > Solr 3.6).
> > >
> > > WBR,
> > > Pavel
> > >
> > > On Mon, Jul 16, 2012 at 1:00 PM, Agnieszka Kukałowicz
> > > <ag...@usable.pl> wrote:
> > > > Hi,
> > > >
> > > > Is the any way to make grouping searches more efficient?
> > > >
> > > > My queries look like:
> > > >
> > >
> >
> /select?q=query&group=true&group.field=id&group.facet=true&group.ngroups=true&facet.field=category1&facet.missing=false&facet.mincount=1
> > > >
> > > > For index with 3 mln documents query for all docs with group=true
> takes
> > > > almost 4000ms. Because queryResultCache is not used next queries
> take a
> > > > long time also.
> > > >
> > > > When I remove group=true and leave only faceting the query for all
> docs
> > > > takes much more less time: for first time ~ 700ms and next runs only
> > > 200ms
> > > > because of queryResultCache being used.
> > > >
> > > > So with group=true the query is about 20 time slower than without it.
> > > > Is it possible or is there any way to improve performance with
> > grouping?
> > > >
> > > > My application needs grouping feature and all of the queries use it
> but
> > > the
> > > > performance of them is to low for production use.
> > > >
> > > > I use Solr 4.x from trunk
> > > >
> > > > Agnieszka Kukalowicz
> > >
> >
> >
> >
>
>
>

Re: Grouping performance problem

Posted by al...@aim.com.
This is strange. We have data folder size 24Gb,  RAM for java 2GB. We query with grouping, ngroups and  highlighting, do not query all fields and query time mostly is less than 1 sec it rarely goes up to 2 sec. We use solr 3.6 and tuned off all kind of caching.
Maybe your problem is with caching and displaying all fields?

Hope this may help.

Alex.



-----Original Message-----
From: Agnieszka Kukałowicz <ag...@usable.pl>
To: solr-user <so...@lucene.apache.org>
Sent: Mon, Jul 16, 2012 10:04 am
Subject: Re: Grouping performance problem


I have server with 24GB RAM. I have 4 shards on it, each of them with 4GB
RAM for java:
JAVA_OPTIONS="-server -Xms4096M -Xmx4096M"
The size is about 15GB for one shard (i use ssd disk for index data).

Agnieszka


2012/7/16 <al...@aim.com>

> What are the RAM of your server and size of the data folder?
>
>
>
> -----Original Message-----
> From: Agnieszka Kukałowicz <ag...@usable.pl>
> To: solr-user <so...@lucene.apache.org>
> Sent: Mon, Jul 16, 2012 6:16 am
> Subject: Re: Grouping performance problem
>
>
> Hi Pavel,
>
> I tried with group.ngroups=false but didn't notice a big improvement.
> The times were still about 4000 ms. It doesn't solve my problem.
> Maybe this is because of my index type. I have millions of documents but
> only about 20 000 groups.
>
>  Cheers
>  Agnieszka
>
> 2012/7/16 Pavel Goncharik <pa...@gmail.com>
>
> > Hi Agnieszka ,
> >
> > if you don't need number of groups, you can try leaving out
> > group.ngroups=true param.
> > In this case Solr apparently skips calculating all groups and delivers
> > results much faster.
> > At least for our application the difference in performance
> > with/without group.ngroups=true is significant (have to say, we use
> > Solr 3.6).
> >
> > WBR,
> > Pavel
> >
> > On Mon, Jul 16, 2012 at 1:00 PM, Agnieszka Kukałowicz
> > <ag...@usable.pl> wrote:
> > > Hi,
> > >
> > > Is the any way to make grouping searches more efficient?
> > >
> > > My queries look like:
> > >
> >
> /select?q=query&group=true&group.field=id&group.facet=true&group.ngroups=true&facet.field=category1&facet.missing=false&facet.mincount=1
> > >
> > > For index with 3 mln documents query for all docs with group=true takes
> > > almost 4000ms. Because queryResultCache is not used next queries take a
> > > long time also.
> > >
> > > When I remove group=true and leave only faceting the query for all docs
> > > takes much more less time: for first time ~ 700ms and next runs only
> > 200ms
> > > because of queryResultCache being used.
> > >
> > > So with group=true the query is about 20 time slower than without it.
> > > Is it possible or is there any way to improve performance with
> grouping?
> > >
> > > My application needs grouping feature and all of the queries use it but
> > the
> > > performance of them is to low for production use.
> > >
> > > I use Solr 4.x from trunk
> > >
> > > Agnieszka Kukalowicz
> >
>
>
>

 

Re: Grouping performance problem

Posted by Agnieszka Kukałowicz <ag...@usable.pl>.
I have server with 24GB RAM. I have 4 shards on it, each of them with 4GB
RAM for java:
JAVA_OPTIONS="-server -Xms4096M -Xmx4096M"
The size is about 15GB for one shard (i use ssd disk for index data).

Agnieszka


2012/7/16 <al...@aim.com>

> What are the RAM of your server and size of the data folder?
>
>
>
> -----Original Message-----
> From: Agnieszka Kukałowicz <ag...@usable.pl>
> To: solr-user <so...@lucene.apache.org>
> Sent: Mon, Jul 16, 2012 6:16 am
> Subject: Re: Grouping performance problem
>
>
> Hi Pavel,
>
> I tried with group.ngroups=false but didn't notice a big improvement.
> The times were still about 4000 ms. It doesn't solve my problem.
> Maybe this is because of my index type. I have millions of documents but
> only about 20 000 groups.
>
>  Cheers
>  Agnieszka
>
> 2012/7/16 Pavel Goncharik <pa...@gmail.com>
>
> > Hi Agnieszka ,
> >
> > if you don't need number of groups, you can try leaving out
> > group.ngroups=true param.
> > In this case Solr apparently skips calculating all groups and delivers
> > results much faster.
> > At least for our application the difference in performance
> > with/without group.ngroups=true is significant (have to say, we use
> > Solr 3.6).
> >
> > WBR,
> > Pavel
> >
> > On Mon, Jul 16, 2012 at 1:00 PM, Agnieszka Kukałowicz
> > <ag...@usable.pl> wrote:
> > > Hi,
> > >
> > > Is the any way to make grouping searches more efficient?
> > >
> > > My queries look like:
> > >
> >
> /select?q=query&group=true&group.field=id&group.facet=true&group.ngroups=true&facet.field=category1&facet.missing=false&facet.mincount=1
> > >
> > > For index with 3 mln documents query for all docs with group=true takes
> > > almost 4000ms. Because queryResultCache is not used next queries take a
> > > long time also.
> > >
> > > When I remove group=true and leave only faceting the query for all docs
> > > takes much more less time: for first time ~ 700ms and next runs only
> > 200ms
> > > because of queryResultCache being used.
> > >
> > > So with group=true the query is about 20 time slower than without it.
> > > Is it possible or is there any way to improve performance with
> grouping?
> > >
> > > My application needs grouping feature and all of the queries use it but
> > the
> > > performance of them is to low for production use.
> > >
> > > I use Solr 4.x from trunk
> > >
> > > Agnieszka Kukalowicz
> >
>
>
>

Re: Grouping performance problem

Posted by al...@aim.com.
What are the RAM of your server and size of the data folder?



-----Original Message-----
From: Agnieszka Kukałowicz <ag...@usable.pl>
To: solr-user <so...@lucene.apache.org>
Sent: Mon, Jul 16, 2012 6:16 am
Subject: Re: Grouping performance problem


Hi Pavel,

I tried with group.ngroups=false but didn't notice a big improvement.
The times were still about 4000 ms. It doesn't solve my problem.
Maybe this is because of my index type. I have millions of documents but
only about 20 000 groups.

 Cheers
 Agnieszka

2012/7/16 Pavel Goncharik <pa...@gmail.com>

> Hi Agnieszka ,
>
> if you don't need number of groups, you can try leaving out
> group.ngroups=true param.
> In this case Solr apparently skips calculating all groups and delivers
> results much faster.
> At least for our application the difference in performance
> with/without group.ngroups=true is significant (have to say, we use
> Solr 3.6).
>
> WBR,
> Pavel
>
> On Mon, Jul 16, 2012 at 1:00 PM, Agnieszka Kukałowicz
> <ag...@usable.pl> wrote:
> > Hi,
> >
> > Is the any way to make grouping searches more efficient?
> >
> > My queries look like:
> >
> /select?q=query&group=true&group.field=id&group.facet=true&group.ngroups=true&facet.field=category1&facet.missing=false&facet.mincount=1
> >
> > For index with 3 mln documents query for all docs with group=true takes
> > almost 4000ms. Because queryResultCache is not used next queries take a
> > long time also.
> >
> > When I remove group=true and leave only faceting the query for all docs
> > takes much more less time: for first time ~ 700ms and next runs only
> 200ms
> > because of queryResultCache being used.
> >
> > So with group=true the query is about 20 time slower than without it.
> > Is it possible or is there any way to improve performance with grouping?
> >
> > My application needs grouping feature and all of the queries use it but
> the
> > performance of them is to low for production use.
> >
> > I use Solr 4.x from trunk
> >
> > Agnieszka Kukalowicz
>

 

Re: Grouping performance problem

Posted by al...@aim.com.

Re: Grouping performance problem

Posted by Agnieszka Kukałowicz <ag...@usable.pl>.
Hi Pavel,

I tried with group.ngroups=false but didn't notice a big improvement.
The times were still about 4000 ms. It doesn't solve my problem.
Maybe this is because of my index type. I have millions of documents but
only about 20 000 groups.

 Cheers
 Agnieszka

2012/7/16 Pavel Goncharik <pa...@gmail.com>

> Hi Agnieszka ,
>
> if you don't need number of groups, you can try leaving out
> group.ngroups=true param.
> In this case Solr apparently skips calculating all groups and delivers
> results much faster.
> At least for our application the difference in performance
> with/without group.ngroups=true is significant (have to say, we use
> Solr 3.6).
>
> WBR,
> Pavel
>
> On Mon, Jul 16, 2012 at 1:00 PM, Agnieszka Kukałowicz
> <ag...@usable.pl> wrote:
> > Hi,
> >
> > Is the any way to make grouping searches more efficient?
> >
> > My queries look like:
> >
> /select?q=query&group=true&group.field=id&group.facet=true&group.ngroups=true&facet.field=category1&facet.missing=false&facet.mincount=1
> >
> > For index with 3 mln documents query for all docs with group=true takes
> > almost 4000ms. Because queryResultCache is not used next queries take a
> > long time also.
> >
> > When I remove group=true and leave only faceting the query for all docs
> > takes much more less time: for first time ~ 700ms and next runs only
> 200ms
> > because of queryResultCache being used.
> >
> > So with group=true the query is about 20 time slower than without it.
> > Is it possible or is there any way to improve performance with grouping?
> >
> > My application needs grouping feature and all of the queries use it but
> the
> > performance of them is to low for production use.
> >
> > I use Solr 4.x from trunk
> >
> > Agnieszka Kukalowicz
>

Re: Grouping performance problem

Posted by Pavel Goncharik <pa...@gmail.com>.
Hi Agnieszka ,

if you don't need number of groups, you can try leaving out
group.ngroups=true param.
In this case Solr apparently skips calculating all groups and delivers
results much faster.
At least for our application the difference in performance
with/without group.ngroups=true is significant (have to say, we use
Solr 3.6).

WBR,
Pavel

On Mon, Jul 16, 2012 at 1:00 PM, Agnieszka Kukałowicz
<ag...@usable.pl> wrote:
> Hi,
>
> Is the any way to make grouping searches more efficient?
>
> My queries look like:
> /select?q=query&group=true&group.field=id&group.facet=true&group.ngroups=true&facet.field=category1&facet.missing=false&facet.mincount=1
>
> For index with 3 mln documents query for all docs with group=true takes
> almost 4000ms. Because queryResultCache is not used next queries take a
> long time also.
>
> When I remove group=true and leave only faceting the query for all docs
> takes much more less time: for first time ~ 700ms and next runs only 200ms
> because of queryResultCache being used.
>
> So with group=true the query is about 20 time slower than without it.
> Is it possible or is there any way to improve performance with grouping?
>
> My application needs grouping feature and all of the queries use it but the
> performance of them is to low for production use.
>
> I use Solr 4.x from trunk
>
> Agnieszka Kukalowicz