You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by "Alice.H.Yang (mis.cnsh04.Newegg) 41493" <Al...@newegg.com> on 2014/05/28 12:42:33 UTC

(Issue) How improve solr group performance

Hi, all
	Does anybody has some advice for me on solr group performance. I have no idea on the group performance.

To David Smiley
  	I am not responsible for endeca, It's a pity ,I have no comment on endeca.

Best Regards,
Alice Yang
+86-021-51530666*41493
Floor 19,KaiKai Plaza,888,Wanhandu Rd,Shanghai(200042)

-----邮件原件-----
发件人: david.w.smiley@gmail.com [mailto:david.w.smiley@gmail.com] 
发送时间: 2014年5月27日 21:29
收件人: solr-user@lucene.apache.org
主题: Re: 答复: (Issue) How improve solr facet performance

Alice,

RE grouping, try Solr 4.8’s new “collapse” qparser w/ “expand"
SearchComponent.  The ref guide has the docs.  It’s usually a faster equivalent approach to group=true

Do you care to comment further on NewEgg’s apparent switch from Endeca to Solr?  (confirm true/false and rationale)

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer http://www.linkedin.com/in/davidwsmiley


On Tue, May 27, 2014 at 4:17 AM, Alice.H.Yang (mis.cnsh04.Newegg) 41493 < Alice.H.Yang@newegg.com> wrote:

> Hi, Token
>
> 1.
>         I set the 3 fields with hundreds of values uses fc and the 
> rest uses enum, the performance is improved 2 times compared with no 
> parameter, and then I add facet.method=20 , the performance is 
> improved about 4 times compared with no parameter.
>         And I also tried setting 9 facet field to one copyfield, I 
> test the performance, it is improved about 2.5 times compared with no parameter.
>         So, It is improved a lot under your advice, thanks a lot.
> 2.
>         Now I have another performance issue, It's the group performance.
> The number of data is as same as facet performance scenario.
> When the keyword search hits about one million documents, the QTime is 
> about 600ms.(It doesn't query the first time, it's in cache)
>
> Query url:
>
> select?fl=item_catalog&q=default_search:paramter&defType=edismax&rows=
> 50&group=true&group.field=item_group_id&group.ngroups=true&group.sort=
> stock4sort%20desc,final_price%20asc,is_selleritem%20asc&sort=score%20d
> esc,default_sort%20desc
>
> It need Qtime about 600ms.
>
> This query have two parameter:
>                                                 1. fl one field
>                                                 2. group=true, 
> group.ngroups=true
>
> If I set group=false,, the QTime is only 1 ms.
> But I need do group and group.ngroups, How can I improve the group 
> performance under this demand. Do you have some advice for me. I'm 
> looking forward to your reply.
>
> Best Regards,
> Alice Yang
> +86-021-51530666*41493
> Floor 19,KaiKai Plaza,888,Wanhandu Rd,Shanghai(200042)
>
>
> -----邮件原件-----
> 发件人: Toke Eskildsen [mailto:te@statsbiblioteket.dk]
> 发送时间: 2014年5月24日 15:17
> 收件人: solr-user@lucene.apache.org
> 主题: RE: (Issue) How improve solr facet performance
>
> Alice.H.Yang (mis.cnsh04.Newegg) 41493 [Alice.H.Yang@newegg.com] wrote:
> > 1.  I'm sorry, I have made a mistake, the total number of documents 
> > is
> 32 Million, not 320 Million.
> > 2.  The system memory is large for solr index, OS total has 256G, I 
> > set
> the solr tomcat HEAPSIZE="-Xms25G -Xmx100G"
>
> 100G is a very high number. What special requirements dictates such a 
> large heap size?
>
> > Reply:  9 fields I facet on.
>
> Solr treats each facet separately and with facet.method=fc and 10M 
> hits, this means that it will iterate 9*10M = 90M document IDs and 
> update the counters for those.
>
> > Reply:  3 facet fields have one hundred unique values, other 6 facet
> fields' unique values are between 3 to 15.
>
> So very low cardinality. This is confirmed by your low response time 
> of 6ms for 2925 hits.
>
> > And we test this scenario:  If the number of facet fields' unique 
> > values
> is less we add facet.method=enum, there is a little to improve performance.
>
> That is a shame: enum is normally the simple answer to a setup like yours.
> Have you tried fine-tuning your fc/enum selection, so that the 3 
> fields with hundreds of values uses fc and the rest uses enum? That 
> might halve your response time.
>
>
> Since the number of unique facets is so low, I do not think that 
> DocValues can help you here. Besides the fine-grained 
> fc/enum-selection above, you could try collapsing all 9 facet-fields 
> into a single field. The idea behind this is that for facet.method=fc, 
> performing faceting on a field with (for example) 300 unique values 
> takes practically the same amount of time as faceting on a field with 
> 1000 unique values: Faceting on a single slightly larger field is much faster than faceting on 9 smaller fields.
> After faceting with facet.limit=-1 on the single super-facet-field, 
> you must match the returned values back to their original fields:
>
>
> If you have the facet-fields
>
> field0: 34
> field1: 187
> field2: 78432
> field3: 3
> ...
>
> then collapse them by or-ing a field-specific mask that is bigger than 
> the max in any field, then put it all into a single field:
>
> fieldAll: 0xA0000000 | 34
> fieldAll: 0xA1000000 | 187
> fieldAll: 0xA2000000 | 78432
> fieldAll: 0xA3000000 | 3
> ...
>
> perform the facet request on fieldAll with facet.limit=-1 and split 
> the resulting counts with
>
> for (entry: facetResultAll) {
>   switch (0xFF000000 & entry.value) {
>     case 0xA0000000:
>       field0.add(entry.value, entry.count);
>       break;
>     case 0xA1000000:
>       field1.add(entry.value, entry.count);
>       break;
> ...
>   }
> }
>
>
> Regards,
> Toke Eskildsen, State and University Library, Denmark
>

Re: (Issue) How improve solr group performance

Posted by Joel Bernstein <jo...@gmail.com>.

Alice,

How many unique groups are there in the field that you are grouping on?

When testing out the CollapsingQParserPlugin, take a look a the nullPolicy
option. If you'r working with a product catalog, there is often a scenario
where some products belong to a group and some don't. For products that
don't have a group you can place a null in the group field and use the
"expand" nullPolicy, which will place each null group record in it's own
group. Using the nullPolicy like this will be much more memory efficient
then placing a "fake" group id in the grouping field.





Joel Bernstein
Search Engineer at Heliosearch


On Wed, May 28, 2014 at 6:42 AM, Alice.H.Yang (mis.cnsh04.Newegg) 41493 <
Alice.H.Yang@newegg.com> wrote:

> Hi, all
>         Does anybody has some advice for me on solr group performance. I
> have no idea on the group performance.
>
> To David Smiley
>         I am not responsible for endeca, It's a pity ,I have no comment on
> endeca.
>
> Best Regards,
> Alice Yang
> +86-021-51530666*41493
> Floor 19,KaiKai Plaza,888,Wanhandu Rd,Shanghai(200042)
>
> -----邮件原件-----
> 发件人: david.w.smiley@gmail.com [mailto:david.w.smiley@gmail.com]
> 发送时间: 2014年5月27日 21:29
> 收件人: solr-user@lucene.apache.org
> 主题: Re: 答复: (Issue) How improve solr facet performance
>
> Alice,
>
> RE grouping, try Solr 4.8’s new “collapse” qparser w/ “expand"
> SearchComponent.  The ref guide has the docs.  It’s usually a faster
> equivalent approach to group=true
>
> Do you care to comment further on NewEgg’s apparent switch from Endeca to
> Solr?  (confirm true/false and rationale)
>
> ~ David Smiley
> Freelance Apache Lucene/Solr Search Consultant/Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Tue, May 27, 2014 at 4:17 AM, Alice.H.Yang (mis.cnsh04.Newegg) 41493 <
> Alice.H.Yang@newegg.com> wrote:
>
> > Hi, Token
> >
> > 1.
> >         I set the 3 fields with hundreds of values uses fc and the
> > rest uses enum, the performance is improved 2 times compared with no
> > parameter, and then I add facet.method=20 , the performance is
> > improved about 4 times compared with no parameter.
> >         And I also tried setting 9 facet field to one copyfield, I
> > test the performance, it is improved about 2.5 times compared with no
> parameter.
> >         So, It is improved a lot under your advice, thanks a lot.
> > 2.
> >         Now I have another performance issue, It's the group performance.
> > The number of data is as same as facet performance scenario.
> > When the keyword search hits about one million documents, the QTime is
> > about 600ms.(It doesn't query the first time, it's in cache)
> >
> > Query url:
> >
> > select?fl=item_catalog&q=default_search:paramter&defType=edismax&rows=
> > 50&group=true&group.field=item_group_id&group.ngroups=true&group.sort=
> > stock4sort%20desc,final_price%20asc,is_selleritem%20asc&sort=score%20d
> > esc,default_sort%20desc
> >
> > It need Qtime about 600ms.
> >
> > This query have two parameter:
> >                                                 1. fl one field
> >                                                 2. group=true,
> > group.ngroups=true
> >
> > If I set group=false,, the QTime is only 1 ms.
> > But I need do group and group.ngroups, How can I improve the group
> > performance under this demand. Do you have some advice for me. I'm
> > looking forward to your reply.
> >
> > Best Regards,
> > Alice Yang
> > +86-021-51530666*41493
> > Floor 19,KaiKai Plaza,888,Wanhandu Rd,Shanghai(200042)
> >
> >
> > -----邮件原件-----
> > 发件人: Toke Eskildsen [mailto:te@statsbiblioteket.dk]
> > 发送时间: 2014年5月24日 15:17
> > 收件人: solr-user@lucene.apache.org
> > 主题: RE: (Issue) How improve solr facet performance
> >
> > Alice.H.Yang (mis.cnsh04.Newegg) 41493 [Alice.H.Yang@newegg.com] wrote:
> > > 1.  I'm sorry, I have made a mistake, the total number of documents
> > > is
> > 32 Million, not 320 Million.
> > > 2.  The system memory is large for solr index, OS total has 256G, I
> > > set
> > the solr tomcat HEAPSIZE="-Xms25G -Xmx100G"
> >
> > 100G is a very high number. What special requirements dictates such a
> > large heap size?
> >
> > > Reply:  9 fields I facet on.
> >
> > Solr treats each facet separately and with facet.method=fc and 10M
> > hits, this means that it will iterate 9*10M = 90M document IDs and
> > update the counters for those.
> >
> > > Reply:  3 facet fields have one hundred unique values, other 6 facet
> > fields' unique values are between 3 to 15.
> >
> > So very low cardinality. This is confirmed by your low response time
> > of 6ms for 2925 hits.
> >
> > > And we test this scenario:  If the number of facet fields' unique
> > > values
> > is less we add facet.method=enum, there is a little to improve
> performance.
> >
> > That is a shame: enum is normally the simple answer to a setup like
> yours.
> > Have you tried fine-tuning your fc/enum selection, so that the 3
> > fields with hundreds of values uses fc and the rest uses enum? That
> > might halve your response time.
> >
> >
> > Since the number of unique facets is so low, I do not think that
> > DocValues can help you here. Besides the fine-grained
> > fc/enum-selection above, you could try collapsing all 9 facet-fields
> > into a single field. The idea behind this is that for facet.method=fc,
> > performing faceting on a field with (for example) 300 unique values
> > takes practically the same amount of time as faceting on a field with
> > 1000 unique values: Faceting on a single slightly larger field is much
> faster than faceting on 9 smaller fields.
> > After faceting with facet.limit=-1 on the single super-facet-field,
> > you must match the returned values back to their original fields:
> >
> >
> > If you have the facet-fields
> >
> > field0: 34
> > field1: 187
> > field2: 78432
> > field3: 3
> > ...
> >
> > then collapse them by or-ing a field-specific mask that is bigger than
> > the max in any field, then put it all into a single field:
> >
> > fieldAll: 0xA0000000 | 34
> > fieldAll: 0xA1000000 | 187
> > fieldAll: 0xA2000000 | 78432
> > fieldAll: 0xA3000000 | 3
> > ...
> >
> > perform the facet request on fieldAll with facet.limit=-1 and split
> > the resulting counts with
> >
> > for (entry: facetResultAll) {
> >   switch (0xFF000000 & entry.value) {
> >     case 0xA0000000:
> >       field0.add(entry.value, entry.count);
> >       break;
> >     case 0xA1000000:
> >       field1.add(entry.value, entry.count);
> >       break;
> > ...
> >   }
> > }
> >
> >
> > Regards,
> > Toke Eskildsen, State and University Library, Denmark
> >
>