You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Alice.H.Yang (mis.cnsh04.Newegg) 41493" <Al...@newegg.com> on 2014/05/23 11:45:57 UTC

(Issue) How improve solr facet performance

Hi, Solr Developers

       We are blocked by solr facet performance when query hits many documents. (about 10,000,000)

Details as follow:
       I have single solr node (solr version is 4.6 ), total has 320,000,000 documents( local index is about 30G)

the index directory as follow:
[cid:image001.png@01CF76A9.00DC4560]
The cache configure as this:
<filterCache class="solr.FastLRUCache"
                 size="1024000"
                initialSize="1024000"
                 autowarmCount="102400"/>

<queryResultCache class="solr.LRUCache"
                     size="1024000"
                     initialSize="1024000"
                     autowarmCount="102400"/>

<documentCache class="solr.LRUCache"
                  size="30720"
                   initialSize="30720"
                   autowarmCount="3072"/>


When we query the field eg. default_search: type or default_search: retail,
in the response , numfound is about one million and hits cache, QTime is “ <int name="QTime">2</int>” OR “<int name="QTime">1</int>.”

But when we add several facet.field to do facet , QTime  increase to 220ms or more.

We add the parameter facet.threads=20, but the improvement is not obvious, and unstable.

Do you have some advice on how improve the facet performance when hit many documents.



Best Regards,
Alice Yang
+86-021-51530666*41493
Floor 19,KaiKai Plaza,888,Wanhandu Rd,Shanghai(200042)

ONCE YOU KNOW, YOU NEWEGG.

CONFIDENTIALITY NOTICE: This email and any files transmitted with it may contain privileged or otherwise confidential information. It is intended only for the person or persons to whom it is addressed. If you received this message in error, you are not authorized to read, print, retain, copy, disclose, disseminate, distribute, or use this message any part thereof or any information contained therein. Please notify the sender immediately and delete all copies of this message. Thank you in advance for your cooperation.
保密注意:此邮件及其附随文件可能包含了保密信息。该邮件的目的是发送给指定收件人。如果您非指定收件人而错误地收到了本邮件,您将无权阅读、打印、保存、复制、泄露、传播、分发或使用此邮件全部或部分内容或者邮件中包含的任何信息。请立即通知发件人,并删除该邮件。感谢您的配合!


Re: (Issue) How improve solr facet performance

Posted by Aman Tandon <am...@gmail.com>.
Thanks varun, but its aman not amit. :)

With Regards
Aman Tandon


On Fri, May 23, 2014 at 4:22 PM, Varun Thacker
<va...@gmail.com>wrote:

> @Amit - https://cwiki.apache.org/confluence/display/solr/DocValues
>
>
> On Fri, May 23, 2014 at 4:15 PM, Jack Krupansky <ja...@basetechnology.com>wrote:
>
>>   (In the future, please submit this type of request to the solr-user
>> email list. Thanks.)
>>
>> How much system memory is available for OS file system caching? If your
>> index (30G) doesn't fit in memory, that can cause thrashing.
>>
>> I would suggest that 320 million docs is way too much for a single Solr
>> node – performance degradation of faceting is one example of the effects of
>> doing so. 100 million would be a more reasonable upper bound.
>>
>> -- Jack Krupansky
>>
>>  *From:* Alice.H.Yang (mis.cnsh04.Newegg) 41493 <Al...@newegg.com>
>> *Sent:* Friday, May 23, 2014 5:45 AM
>> *To:* mailto:dev@lucene.apache.org <de...@lucene.apache.org>
>> *Subject:* (Issue) How improve solr facet performance
>>
>>
>> Hi, Solr Developers
>>
>>
>>
>>        We are blocked by solr facet performance when query hits many
>> documents. (about 10,000,000)
>>
>>
>>
>> Details as follow:
>>
>>        I have single solr node (solr version is 4.6 ), total has
>> 320,000,000 documents( local index is about 30G)
>>
>>
>>
>> the index directory as follow:
>>
>>
>>  The cache configure as this:
>>
>> <filterCache class="solr.FastLRUCache"
>>
>>                  size="1024000"
>>
>>                 initialSize="1024000"
>>
>>                  autowarmCount="102400"/>
>>
>>
>>
>> <queryResultCache class="solr.LRUCache"
>>
>>                      size="1024000"
>>
>>                      initialSize="1024000"
>>
>>                      autowarmCount="102400"/>
>>
>>
>>
>> <documentCache class="solr.LRUCache"
>>
>>                   size="30720"
>>
>>                    initialSize="30720"
>>
>>                    autowarmCount="3072"/>
>>
>>
>>
>>
>>
>> When we query the field eg. default_search: type or default_search: retail,
>>
>>
>> in the response , numfound is about one million and hits cache, QTime is
>> “ <int name="QTime">2</int>” OR “<int name="QTime">1</int>.”
>>
>>
>>
>> But when we add several facet.field to do facet , QTime  increase to
>> 220ms or more.
>>
>>
>>
>> We add the parameter facet.threads=20, but the improvement is not
>> obvious, and unstable.
>>
>>
>>
>> Do you have some advice on how improve the facet performance when hit
>> many documents.
>>
>>
>>
>>
>>
>>
>>
>> Best Regards,
>>
>> *Alice Yang*
>>
>> *+86-021-51530666 <%2B86-021-51530666>*41493*
>>
>> *Floor 19,KaiKai Plaza,888,Wanhandu Rd,Shanghai(200042)*
>>
>>
>>
>> *ONCE YOU KNOW, YOU NEWEGG.*
>>
>>
>>
>> *CONFIDENTIALITY NOTICE: This email and any files transmitted with it may
>> contain privileged or otherwise confidential information. It is intended
>> only for the person or persons to whom it is addressed. If you received
>> this message in error, you are not authorized to read, print, retain, copy,
>> disclose, disseminate, distribute, or use this message any part thereof or
>> any information contained therein. Please notify the sender immediately and
>> delete all copies of this message. Thank you in advance for your
>> cooperation.*
>>
>>
>> 保密注意:此邮件及其附随文件可能包含了保密信息。该邮件的目的是发送给指定收件人。如果您非指定收件人而错误地收到了本邮件,您将无权阅读、打印、保存、复制、泄露、传播、分发或使用此邮件全部或部分内容或者邮件中包含的任何信息。请立即通知发件人,并删除该邮件。感谢您的配合!
>>
>>
>>
>
>
>
> --
>
>
> Regards,
> Varun Thacker
> http://www.vthacker.in/
>

Re: (Issue) How improve solr facet performance

Posted by Varun Thacker <va...@gmail.com>.
@Amit - https://cwiki.apache.org/confluence/display/solr/DocValues


On Fri, May 23, 2014 at 4:15 PM, Jack Krupansky <ja...@basetechnology.com>wrote:

>   (In the future, please submit this type of request to the solr-user
> email list. Thanks.)
>
> How much system memory is available for OS file system caching? If your
> index (30G) doesn't fit in memory, that can cause thrashing.
>
> I would suggest that 320 million docs is way too much for a single Solr
> node – performance degradation of faceting is one example of the effects of
> doing so. 100 million would be a more reasonable upper bound.
>
> -- Jack Krupansky
>
>  *From:* Alice.H.Yang (mis.cnsh04.Newegg) 41493 <Al...@newegg.com>
> *Sent:* Friday, May 23, 2014 5:45 AM
> *To:* mailto:dev@lucene.apache.org <de...@lucene.apache.org>
> *Subject:* (Issue) How improve solr facet performance
>
>
> Hi, Solr Developers
>
>
>
>        We are blocked by solr facet performance when query hits many
> documents. (about 10,000,000)
>
>
>
> Details as follow:
>
>        I have single solr node (solr version is 4.6 ), total has
> 320,000,000 documents( local index is about 30G)
>
>
>
> the index directory as follow:
>
>
>  The cache configure as this:
>
> <filterCache class="solr.FastLRUCache"
>
>                  size="1024000"
>
>                 initialSize="1024000"
>
>                  autowarmCount="102400"/>
>
>
>
> <queryResultCache class="solr.LRUCache"
>
>                      size="1024000"
>
>                      initialSize="1024000"
>
>                      autowarmCount="102400"/>
>
>
>
> <documentCache class="solr.LRUCache"
>
>                   size="30720"
>
>                    initialSize="30720"
>
>                    autowarmCount="3072"/>
>
>
>
>
>
> When we query the field eg. default_search: type or default_search: retail,
>
>
> in the response , numfound is about one million and hits cache, QTime is “
> <int name="QTime">2</int>” OR “<int name="QTime">1</int>.”
>
>
>
> But when we add several facet.field to do facet , QTime  increase to
> 220ms or more.
>
>
>
> We add the parameter facet.threads=20, but the improvement is not obvious,
> and unstable.
>
>
>
> Do you have some advice on how improve the facet performance when hit many
> documents.
>
>
>
>
>
>
>
> Best Regards,
>
> *Alice Yang*
>
> *+86-021-51530666 <%2B86-021-51530666>*41493*
>
> *Floor 19,KaiKai Plaza,888,Wanhandu Rd,Shanghai(200042)*
>
>
>
> *ONCE YOU KNOW, YOU NEWEGG.*
>
>
>
> *CONFIDENTIALITY NOTICE: This email and any files transmitted with it may
> contain privileged or otherwise confidential information. It is intended
> only for the person or persons to whom it is addressed. If you received
> this message in error, you are not authorized to read, print, retain, copy,
> disclose, disseminate, distribute, or use this message any part thereof or
> any information contained therein. Please notify the sender immediately and
> delete all copies of this message. Thank you in advance for your
> cooperation.*
>
>
> 保密注意:此邮件及其附随文件可能包含了保密信息。该邮件的目的是发送给指定收件人。如果您非指定收件人而错误地收到了本邮件,您将无权阅读、打印、保存、复制、泄露、传播、分发或使用此邮件全部或部分内容或者邮件中包含的任何信息。请立即通知发件人,并删除该邮件。感谢您的配合!
>
>
>



-- 


Regards,
Varun Thacker
http://www.vthacker.in/

Re: (Issue) How improve solr facet performance

Posted by Jack Krupansky <ja...@basetechnology.com>.
(In the future, please submit this type of request to the solr-user email list. Thanks.)

How much system memory is available for OS file system caching? If your index (30G) doesn't fit in memory, that can cause thrashing.

I would suggest that 320 million docs is way too much for a single Solr node �C performance degradation of faceting is one example of the effects of doing so. 100 million would be a more reasonable upper bound.

-- Jack Krupansky

From: Alice.H.Yang (mis.cnsh04.Newegg) 41493 
Sent: Friday, May 23, 2014 5:45 AM
To: mailto:dev@lucene.apache.org 
Subject: (Issue) How improve solr facet performance

Hi, Solr Developers

 

       We are blocked by solr facet performance when query hits many documents. (about 10,000,000)

 

Details as follow:

       I have single solr node (solr version is 4.6 ), total has 320,000,000 documents( local index is about 30G) 

 

the index directory as follow��




The cache configure as this:

<filterCache class="solr.FastLRUCache"

                 size="1024000"

                initialSize="1024000"

                 autowarmCount="102400"/>

 

<queryResultCache class="solr.LRUCache"

                     size="1024000"

                     initialSize="1024000"

                     autowarmCount="102400"/>

 

<documentCache class="solr.LRUCache"

                  size="30720"

                   initialSize="30720"

                   autowarmCount="3072"/>

 

 

When we query the field eg. default_search: type or default_search: retail, 

in the response , numfound is about one million and hits cache�� QTime is �� <int name="QTime">2</int>�� OR ��<int name="QTime">1</int>.��

 

But when we add several facet.field to do facet , QTime  increase to 220ms or more.

 

We add the parameter facet.threads=20, but the improvement is not obvious, and unstable.

 

Do you have some advice on how improve the facet performance when hit many documents. 

 

 

 

Best Regards,

Alice Yang

+86-021-51530666*41493

Floor 19,KaiKai Plaza,888,Wanhandu Rd,Shanghai(200042)

 

ONCE YOU KNOW, YOU NEWEGG.

 

CONFIDENTIALITY NOTICE: This email and any files transmitted with it may contain privileged or otherwise confidential information. It is intended only for the person or persons to whom it is addressed. If you received this message in error, you are not authorized to read, print, retain, copy, disclose, disseminate, distribute, or use this message any part thereof or any information contained therein. Please notify the sender immediately and delete all copies of this message. Thank you in advance for your cooperation.

����ע�⣺���ʼ����丽���ļ����ܰ����˱�����Ϣ�����ʼ���Ŀ���Ƿ��͸�ָ���ռ��ˡ��������ָ���ռ��˶�������յ��˱��ʼ���������Ȩ�Ķ�����ӡ�����桢���ơ�й¶���������ַ���ʹ�ô��ʼ�ȫ���򲿷����ݻ����ʼ��а������κ���Ϣ��������֪ͨ�����ˣ���ɾ�����ʼ�����л������ϣ�

 

Re: (Issue) How improve solr facet performance

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Fri, 2014-05-23 at 11:45 +0200, Alice.H.Yang (mis.cnsh04.Newegg)
41493 wrote:
>        We are blocked by solr facet performance when query hits many
> documents. (about 10,000,000)

[320M documents, immediate response for plain search with 1M hits]

> But when we add several facet.field to do facet ,QTime  increaseto
> 220ms or more.

It is not clear whether your observation of increased response time is
due to many hits or faceting in itself.

- How many fields are you faceting on?
- How many unique values does your facet fields have (approximately)?
- What is the content of your facets (Strings, numbers?)
- Which facet.method do you use?
- What is the response time with faceting and a few thousand hits?

> Do you have some advice on how improve the facet performance when hit
> many documents. 

That depends on whether your bottleneck is the hitcount itself, the
number of unique facet values or something third like I/O.


- Toke Eskildsen, State and University Library, Denmark



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: (Issue) How improve solr facet performance

Posted by Aman Tandon <am...@gmail.com>.
What is DocValues i am unaware of this??

With Regards
Aman Tandon


On Fri, May 23, 2014 at 4:04 PM, Varun Thacker
<va...@gmail.com>wrote:

> Have you tried using DocValues and see the performance gains?
>
>
> On Fri, May 23, 2014 at 3:15 PM, Alice.H.Yang (mis.cnsh04.Newegg) 41493 <
> Alice.H.Yang@newegg.com> wrote:
>
>>  Hi, Solr Developers
>>
>>
>>
>>        We are blocked by solr facet performance when query hits many
>> documents. (about 10,000,000)
>>
>>
>>
>> Details as follow:
>>
>>        I have single solr node (solr version is 4.6 ), total has
>> 320,000,000 documents( local index is about 30G)
>>
>>
>>
>> the index directory as follow:
>>
>>
>>  The cache configure as this:
>>
>> <filterCache class="solr.FastLRUCache"
>>
>>                  size="1024000"
>>
>>                 initialSize="1024000"
>>
>>                  autowarmCount="102400"/>
>>
>>
>>
>> <queryResultCache class="solr.LRUCache"
>>
>>                      size="1024000"
>>
>>                      initialSize="1024000"
>>
>>                      autowarmCount="102400"/>
>>
>>
>>
>> <documentCache class="solr.LRUCache"
>>
>>                   size="30720"
>>
>>                    initialSize="30720"
>>
>>                    autowarmCount="3072"/>
>>
>>
>>
>>
>>
>> When we query the field eg. default_search: type or default_search: retail,
>>
>>
>> in the response , numfound is about one million and hits cache, QTime is
>> “ <int name="QTime">2</int>” OR “<int name="QTime">1</int>.”
>>
>>
>>
>> But when we add several facet.field to do facet , QTime  increase to
>> 220ms or more.
>>
>>
>>
>> We add the parameter facet.threads=20, but the improvement is not
>> obvious, and unstable.
>>
>>
>>
>> Do you have some advice on how improve the facet performance when hit
>> many documents.
>>
>>
>>
>>
>>
>>
>>
>> Best Regards,
>>
>> *Alice Yang*
>>
>> *+86-021-51530666 <%2B86-021-51530666>*41493*
>>
>> *Floor 19,KaiKai Plaza,888,Wanhandu Rd,Shanghai(200042)*
>>
>>
>>
>> *ONCE YOU KNOW, YOU NEWEGG.*
>>
>>
>>
>> *CONFIDENTIALITY NOTICE: This email and any files transmitted with it may
>> contain privileged or otherwise confidential information. It is intended
>> only for the person or persons to whom it is addressed. If you received
>> this message in error, you are not authorized to read, print, retain, copy,
>> disclose, disseminate, distribute, or use this message any part thereof or
>> any information contained therein. Please notify the sender immediately and
>> delete all copies of this message. Thank you in advance for your
>> cooperation.*
>>
>>
>> 保密注意:此邮件及其附随文件可能包含了保密信息。该邮件的目的是发送给指定收件人。如果您非指定收件人而错误地收到了本邮件,您将无权阅读、打印、保存、复制、泄露、传播、分发或使用此邮件全部或部分内容或者邮件中包含的任何信息。请立即通知发件人,并删除该邮件。感谢您的配合!
>>
>>
>>
>
>
>
> --
>
>
> Regards,
> Varun Thacker
> http://www.vthacker.in/
>

Re: (Issue) How improve solr facet performance

Posted by Varun Thacker <va...@gmail.com>.
Have you tried using DocValues and see the performance gains?


On Fri, May 23, 2014 at 3:15 PM, Alice.H.Yang (mis.cnsh04.Newegg) 41493 <
Alice.H.Yang@newegg.com> wrote:

>  Hi, Solr Developers
>
>
>
>        We are blocked by solr facet performance when query hits many
> documents. (about 10,000,000)
>
>
>
> Details as follow:
>
>        I have single solr node (solr version is 4.6 ), total has
> 320,000,000 documents( local index is about 30G)
>
>
>
> the index directory as follow:
>
>
>  The cache configure as this:
>
> <filterCache class="solr.FastLRUCache"
>
>                  size="1024000"
>
>                 initialSize="1024000"
>
>                  autowarmCount="102400"/>
>
>
>
> <queryResultCache class="solr.LRUCache"
>
>                      size="1024000"
>
>                      initialSize="1024000"
>
>                      autowarmCount="102400"/>
>
>
>
> <documentCache class="solr.LRUCache"
>
>                   size="30720"
>
>                    initialSize="30720"
>
>                    autowarmCount="3072"/>
>
>
>
>
>
> When we query the field eg. default_search: type or default_search: retail,
>
>
> in the response , numfound is about one million and hits cache, QTime is “
> <int name="QTime">2</int>” OR “<int name="QTime">1</int>.”
>
>
>
> But when we add several facet.field to do facet , QTime  increase to
> 220ms or more.
>
>
>
> We add the parameter facet.threads=20, but the improvement is not obvious,
> and unstable.
>
>
>
> Do you have some advice on how improve the facet performance when hit many
> documents.
>
>
>
>
>
>
>
> Best Regards,
>
> *Alice Yang*
>
> *+86-021-51530666 <%2B86-021-51530666>*41493*
>
> *Floor 19,KaiKai Plaza,888,Wanhandu Rd,Shanghai(200042)*
>
>
>
> *ONCE YOU KNOW, YOU NEWEGG.*
>
>
>
> *CONFIDENTIALITY NOTICE: This email and any files transmitted with it may
> contain privileged or otherwise confidential information. It is intended
> only for the person or persons to whom it is addressed. If you received
> this message in error, you are not authorized to read, print, retain, copy,
> disclose, disseminate, distribute, or use this message any part thereof or
> any information contained therein. Please notify the sender immediately and
> delete all copies of this message. Thank you in advance for your
> cooperation.*
>
>
> 保密注意:此邮件及其附随文件可能包含了保密信息。该邮件的目的是发送给指定收件人。如果您非指定收件人而错误地收到了本邮件,您将无权阅读、打印、保存、复制、泄露、传播、分发或使用此邮件全部或部分内容或者邮件中包含的任何信息。请立即通知发件人,并删除该邮件。感谢您的配合!
>
>
>



-- 


Regards,
Varun Thacker
http://www.vthacker.in/