You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Patanachai Tangchaisin <pa...@wizecommerce.com> on 2013/11/05 02:38:22 UTC

Disjuctive Queries (OR queries) and FilterCache

Hello,

We are running our search system using Apache Solr 4.2.1 and using
Master/Slave model.
Our index has ~100M document. The index size is  ~20gb.
The machine has 24 CPU and 48gb rams.

Our response time is pretty bad, median is ~4 seconds with 25
queries/second.

We noticed a couple of things
- Our machine always use 100% CPU.
- There is a lot of room for Java Heap. We assign Xms12g and Xmx16g, but
the size of heap is still only 12g
- Solr's filterCache hit ratio is only 0.76 and the number of insertion
and eviction is almost equal.

The weird thing is
- most items in Solr's filterCache (only 100 first) are specify to only
1 field which we filter it by using an OR query for this field. Note
that every request will have this field constraint.

For example, if field name is x
fq=x:(1 OR 2 OR 3)&fq=y:'a'
fq=x:(3 OR 2 OR 1)&fq=y:'b'
fq=x:(2 OR 1 OR 3)&fq=y:'c'

An order of items is different since it is an input from a different
system.

To me, it seems that Solr do a cache on this field in different entry if
an order of item is different. e.g. "(1 OR 2)" and "(2 OR 1)" is going
to be a different cache entry.

Question:
Is there other way to create a fq parameter using 'OR' and make Solr
cache them as a same entry?


Thanks,
Patanachai Tangchaisin

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: Disjuctive Queries (OR queries) and FilterCache

Posted by Erick Erickson <er...@gmail.com>.
Glad to hear you have a solution....

Best,
Erick


On Thu, Nov 7, 2013 at 5:12 PM, Patanachai Tangchaisin <
patanachai.tangchaisin@wizecommerce.com> wrote:

> Hi Erick,
>
> About the size of filter cache, previously we set it to 4,000.
> After we faced this problem, we changed it to 10,000.
> Still at size of 10,000 (always full), hitratio was 0.78 and "eviction"
> was as high as "insertion".
>
> About 100% Cpu, yes, it was Solr using it.
> I profiled an app, it was "DisjunctionSumScorer" that takes most CPU times.
> Since this is a required filter query, we set it for every requests.
> My assumption is because Solr cannot use a filter cache, the filter query
> has to be executed at a same time as normal query.
>
> However, we fix this problem by sorting our filter constraints before
> creating a filter query.
> So, {"1","2","3"}, {"2","3","1"}, {"3","2","1"} will be a same the filter
> query i.e. fq=x:("1"  OR "2" OR "3").
>
> We end up with very small filter cache size (<1,000) and hit ratio is now
> 0.99. There is no eviction at all.
> The median response time is now less than 200ms on 25 QPS.
>
> Thanks,
> Patanachai
>
>
> On 11/07/2013 04:37 AM, Erick Erickson wrote:
>
> Yeah, Solr's fq cache is pretty simple-minded,
> order matters. There's no good way to improve
> that except try to write your fq queries in the
> same order. It's actually quite tricky to
> disassemble/reassemble arbitrary queries to fix
> this problem.
>
> But in your case, you could write a custom query
> component that was able to handle this _specific_
> case relatively easily I should think.
>
> bq: Our machine always use 100% CPU
>
> This is strange. Are you sure Solr is using this?
> Are there any other processes on the server that
> might be using this? Top (*nix) might help here. If
> it's really all Solr, then you need another slave
> or two to handle the load. Do you get good responses
> when the QPS rate is, say 10?
>
> How big is your filter cache?
>
> A hit ratio of .76 isn't actually too bad. It looks like
> you're running for a long time, and if so the insert
> and eviction numbers will tend to the same number.
>
> Do beware of using NOW in your fq clauses, that can
> cause grief. See:
> http://searchhub.org/2012/02/23/date-math-now-and-filter-queries/
>
> This seems like really poor performance, I'm puzzled.
>
> Best,
> Erick
>
>
>
>
> On Mon, Nov 4, 2013 at 8:38 PM, Patanachai Tangchaisin <
> patanachai.tangchaisin@wizecommerce.com<mailto:patana
> chai.tangchaisin@wizecommerce.com>> wrote:
>
>
>
> Hello,
>
> We are running our search system using Apache Solr 4.2.1 and using
> Master/Slave model.
> Our index has ~100M document. The index size is  ~20gb.
> The machine has 24 CPU and 48gb rams.
>
> Our response time is pretty bad, median is ~4 seconds with 25
> queries/second.
>
> We noticed a couple of things
> - Our machine always use 100% CPU.
> - There is a lot of room for Java Heap. We assign Xms12g and Xmx16g, but
> the size of heap is still only 12g
> - Solr's filterCache hit ratio is only 0.76 and the number of insertion
> and eviction is almost equal.
>
> The weird thing is
> - most items in Solr's filterCache (only 100 first) are specify to only
> 1 field which we filter it by using an OR query for this field. Note
> that every request will have this field constraint.
>
> For example, if field name is x
> fq=x:(1 OR 2 OR 3)&fq=y:'a'
> fq=x:(3 OR 2 OR 1)&fq=y:'b'
> fq=x:(2 OR 1 OR 3)&fq=y:'c'
>
> An order of items is different since it is an input from a different
> system.
>
> To me, it seems that Solr do a cache on this field in different entry if
> an order of item is different. e.g. "(1 OR 2)" and "(2 OR 1)" is going
> to be a different cache entry.
>
> Question:
> Is there other way to create a fq parameter using 'OR' and make Solr
> cache them as a same entry?
>
>
> Thanks,
> Patanachai Tangchaisin
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>
>
>
>
>
>
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>

Re: Disjuctive Queries (OR queries) and FilterCache

Posted by Patanachai Tangchaisin <pa...@wizecommerce.com>.
Hi Erick,

About the size of filter cache, previously we set it to 4,000.
After we faced this problem, we changed it to 10,000.
Still at size of 10,000 (always full), hitratio was 0.78 and "eviction" was as high as "insertion".

About 100% Cpu, yes, it was Solr using it.
I profiled an app, it was "DisjunctionSumScorer" that takes most CPU times.
Since this is a required filter query, we set it for every requests.
My assumption is because Solr cannot use a filter cache, the filter query has to be executed at a same time as normal query.

However, we fix this problem by sorting our filter constraints before creating a filter query.
So, {"1","2","3"}, {"2","3","1"}, {"3","2","1"} will be a same the filter query i.e. fq=x:("1"  OR "2" OR "3").

We end up with very small filter cache size (<1,000) and hit ratio is now 0.99. There is no eviction at all.
The median response time is now less than 200ms on 25 QPS.

Thanks,
Patanachai

On 11/07/2013 04:37 AM, Erick Erickson wrote:

Yeah, Solr's fq cache is pretty simple-minded,
order matters. There's no good way to improve
that except try to write your fq queries in the
same order. It's actually quite tricky to
disassemble/reassemble arbitrary queries to fix
this problem.

But in your case, you could write a custom query
component that was able to handle this _specific_
case relatively easily I should think.

bq: Our machine always use 100% CPU

This is strange. Are you sure Solr is using this?
Are there any other processes on the server that
might be using this? Top (*nix) might help here. If
it's really all Solr, then you need another slave
or two to handle the load. Do you get good responses
when the QPS rate is, say 10?

How big is your filter cache?

A hit ratio of .76 isn't actually too bad. It looks like
you're running for a long time, and if so the insert
and eviction numbers will tend to the same number.

Do beware of using NOW in your fq clauses, that can
cause grief. See:
http://searchhub.org/2012/02/23/date-math-now-and-filter-queries/

This seems like really poor performance, I'm puzzled.

Best,
Erick




On Mon, Nov 4, 2013 at 8:38 PM, Patanachai Tangchaisin <
patanachai.tangchaisin@wizecommerce.com<ma...@wizecommerce.com>> wrote:



Hello,

We are running our search system using Apache Solr 4.2.1 and using
Master/Slave model.
Our index has ~100M document. The index size is  ~20gb.
The machine has 24 CPU and 48gb rams.

Our response time is pretty bad, median is ~4 seconds with 25
queries/second.

We noticed a couple of things
- Our machine always use 100% CPU.
- There is a lot of room for Java Heap. We assign Xms12g and Xmx16g, but
the size of heap is still only 12g
- Solr's filterCache hit ratio is only 0.76 and the number of insertion
and eviction is almost equal.

The weird thing is
- most items in Solr's filterCache (only 100 first) are specify to only
1 field which we filter it by using an OR query for this field. Note
that every request will have this field constraint.

For example, if field name is x
fq=x:(1 OR 2 OR 3)&fq=y:'a'
fq=x:(3 OR 2 OR 1)&fq=y:'b'
fq=x:(2 OR 1 OR 3)&fq=y:'c'

An order of items is different since it is an input from a different
system.

To me, it seems that Solr do a cache on this field in different entry if
an order of item is different. e.g. "(1 OR 2)" and "(2 OR 1)" is going
to be a different cache entry.

Question:
Is there other way to create a fq parameter using 'OR' and make Solr
cache them as a same entry?


Thanks,
Patanachai Tangchaisin

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the
intended recipient(s) and may contain confidential and privileged
information. Any unauthorized review, use, disclosure or distribution is
prohibited. If you are not the intended recipient, please contact the
sender by reply email and destroy all copies of the original message along
with any attachments, from your computer system. If you are the intended
recipient, please be advised that the content of this message is subject to
access, review and disclosure by the sender's Email System Administrator.







CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: Disjuctive Queries (OR queries) and FilterCache

Posted by Erick Erickson <er...@gmail.com>.
Yeah, Solr's fq cache is pretty simple-minded,
order matters. There's no good way to improve
that except try to write your fq queries in the
same order. It's actually quite tricky to
disassemble/reassemble arbitrary queries to fix
this problem.

But in your case, you could write a custom query
component that was able to handle this _specific_
case relatively easily I should think.

bq: Our machine always use 100% CPU

This is strange. Are you sure Solr is using this?
Are there any other processes on the server that
might be using this? Top (*nix) might help here. If
it's really all Solr, then you need another slave
or two to handle the load. Do you get good responses
when the QPS rate is, say 10?

How big is your filter cache?

A hit ratio of .76 isn't actually too bad. It looks like
you're running for a long time, and if so the insert
and eviction numbers will tend to the same number.

Do beware of using NOW in your fq clauses, that can
cause grief. See:
http://searchhub.org/2012/02/23/date-math-now-and-filter-queries/

This seems like really poor performance, I'm puzzled.

Best,
Erick




On Mon, Nov 4, 2013 at 8:38 PM, Patanachai Tangchaisin <
patanachai.tangchaisin@wizecommerce.com> wrote:

> Hello,
>
> We are running our search system using Apache Solr 4.2.1 and using
> Master/Slave model.
> Our index has ~100M document. The index size is  ~20gb.
> The machine has 24 CPU and 48gb rams.
>
> Our response time is pretty bad, median is ~4 seconds with 25
> queries/second.
>
> We noticed a couple of things
> - Our machine always use 100% CPU.
> - There is a lot of room for Java Heap. We assign Xms12g and Xmx16g, but
> the size of heap is still only 12g
> - Solr's filterCache hit ratio is only 0.76 and the number of insertion
> and eviction is almost equal.
>
> The weird thing is
> - most items in Solr's filterCache (only 100 first) are specify to only
> 1 field which we filter it by using an OR query for this field. Note
> that every request will have this field constraint.
>
> For example, if field name is x
> fq=x:(1 OR 2 OR 3)&fq=y:'a'
> fq=x:(3 OR 2 OR 1)&fq=y:'b'
> fq=x:(2 OR 1 OR 3)&fq=y:'c'
>
> An order of items is different since it is an input from a different
> system.
>
> To me, it seems that Solr do a cache on this field in different entry if
> an order of item is different. e.g. "(1 OR 2)" and "(2 OR 1)" is going
> to be a different cache entry.
>
> Question:
> Is there other way to create a fq parameter using 'OR' and make Solr
> cache them as a same entry?
>
>
> Thanks,
> Patanachai Tangchaisin
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>