You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Gal Nitzan <ga...@gmail.com> on 2007/05/31 10:33:39 UTC

facet question

Hi,

We have a small index with about 4 million docs.

On this index we have a field "tags" which is a multiple values field.

Running a facet query on the index with something like: 
facet=true&facetField=tags&q=type:video takes about 1 minute.

We have defined a large cache which enables the query to run much faster 
(about 1 sec)

<filterCache
 class="solr.LRUCache"
 size="1500000"
 initialSize="600000"
 autowarmCount="300000"/>


However, the cache size brings us to the 2GB limit.


Any thoughts, tips would be appreciated.


Gal.



RE: facet question

Posted by Chris Hostetter <ho...@fucit.org>.
: > Also, I'm still suspicious about your application.  You have 1.5M
: > distinct tags for 4M documents?  That seems quite dense.

it's possible the app is using the filterCache for other things (on other
fields) besies just the tag field ... but that still doesn't explain one
thing...

: description: 	 LRU Cache(maxSize=1000000, initialSize=600000,

...that doesn't look like it matches the config you posted earlier...

>>>: <filterCache
>>>:  class="solr.LRUCache"
>>>:  size="1500000"          <--- not 1000000

...either way if you have that many unique "tags" i think the HashDocSet
suggestion may be the best way to go, since each tag probably has a very
low cardinality (i can't imagine they'd be very high with that kind of
ratio)

I would also give serious thought to caching the solr results externally
(using squid or memcached or something like that) ... Solr will cache the
individual computations for you very well .. but for something like a tag
cloud you probably don't care about the exact numeric values that much,
and minor fluctuations as tags are added removed (or new items come in)
aren't going to be a big issue.


-Hoss


RE: facet question

Posted by Gal Nitzan <ga...@gmail.com>.

> -----Original Message-----
> From: Mike Klaas [mailto:mike.klaas@gmail.com]
> Sent: Friday, June 01, 2007 12:36 AM
> To: solr-user@lucene.apache.org
> Subject: Re: facet question
>
> On 31-May-07, at 1:35 PM, Gal Nitzan wrote:
>
> >>>
> >>> However, the cache size brings us to the 2GB limit.
> >>
> >> If the cardinality of many of the tags is low, you can use HashSet-
> >> based filters (the default size at which a HashSet is used is 3000).
> > [Gal Nitzan]
> >
> > I will appreciate a pointer to documentation on HahsSet based filters
> > tahnks...
>
> http://wiki.apache.org/solr/SolrConfigXml#head-
> ffe19c34abf267ca2d49d9e7102feab8c79b5fb5
[Gal Nitzan]
Thanks...
>
> Scroll down to the HashDocSet comment.
>
> I'm not sure how much this will help--it depends greatly on the
> distribution of your tag values.
>
> Also, I'm still suspicious about your application.  You have 1.5M
> distinct tags for 4M documents?  That seems quite dense.
[Gal Nitzan]
Basically the facet query runs on each access to the home page (viewed in a 
tag cloud) but it almost doesn't change...
Here are my cache stats:
description: 	 LRU Cache(maxSize=1000000, initialSize=600000, 
autowarmCount=300000, 
regenerator=org.apache.solr.search.SolrIndexSearcher$1@e2f574)
stats: 	lookups : 230538968
hits : 228863426
hitratio : 0.99
inserts : 1675543
evictions : 0
size : 944831
cumulative_lookups : 230538968
cumulative_hits : 228863426
cumulative_hitratio : 0.99
cumulative_inserts : 1675543
cumulative_evictions : 0

>
> -Mike



Re: facet question

Posted by Mike Klaas <mi...@gmail.com>.
On 31-May-07, at 1:35 PM, Gal Nitzan wrote:

>>>
>>> However, the cache size brings us to the 2GB limit.
>>
>> If the cardinality of many of the tags is low, you can use HashSet-
>> based filters (the default size at which a HashSet is used is 3000).
> [Gal Nitzan]
>
> I will appreciate a pointer to documentation on HahsSet based filters
> tahnks...

http://wiki.apache.org/solr/SolrConfigXml#head- 
ffe19c34abf267ca2d49d9e7102feab8c79b5fb5

Scroll down to the HashDocSet comment.

I'm not sure how much this will help--it depends greatly on the  
distribution of your tag values.

Also, I'm still suspicious about your application.  You have 1.5M  
distinct tags for 4M documents?  That seems quite dense.

-Mike

RE: facet question

Posted by Gal Nitzan <ga...@gmail.com>.

> -----Original Message-----
> From: Mike Klaas [mailto:mike.klaas@gmail.com]
> Sent: Thursday, May 31, 2007 9:07 PM
> To: solr-user@lucene.apache.org
> Subject: Re: facet question
>
> On 31-May-07, at 1:33 AM, Gal Nitzan wrote:
>
> > Hi,
> >
> > We have a small index with about 4 million docs.
> >
> > On this index we have a field "tags" which is a multiple values field.
> >
> > Running a facet query on the index with something like:
> > facet=true&facetField=tags&q=type:video takes about 1 minute.
> >
> > We have defined a large cache which enables the query to run much
> > faster
> > (about 1 sec)
> >
> > <filterCache
> >  class="solr.LRUCache"
> >  size="1500000"
> >  initialSize="600000"
> >  autowarmCount="300000"/>
> >
> >
> > However, the cache size brings us to the 2GB limit.
>
> If the cardinality of many of the tags is low, you can use HashSet-
> based filters (the default size at which a HashSet is used is 3000).
[Gal Nitzan]

I will appreciate a pointer to documentation on HahsSet based filters 
tahnks...


>
> Do you really have 1.5M unique values in that field.  Are you
> analyzing the field (you probably shouldn't be)?

[Gal Nitzan]
No it is not analyzed. Just indexed and stored.



>
> -Mike
[Gal Nitzan]




Re: facet question

Posted by Mike Klaas <mi...@gmail.com>.
On 31-May-07, at 1:33 AM, Gal Nitzan wrote:

> Hi,
>
> We have a small index with about 4 million docs.
>
> On this index we have a field "tags" which is a multiple values field.
>
> Running a facet query on the index with something like:
> facet=true&facetField=tags&q=type:video takes about 1 minute.
>
> We have defined a large cache which enables the query to run much  
> faster
> (about 1 sec)
>
> <filterCache
>  class="solr.LRUCache"
>  size="1500000"
>  initialSize="600000"
>  autowarmCount="300000"/>
>
>
> However, the cache size brings us to the 2GB limit.

If the cardinality of many of the tags is low, you can use HashSet- 
based filters (the default size at which a HashSet is used is 3000).

Do you really have 1.5M unique values in that field.  Are you  
analyzing the field (you probably shouldn't be)?

-Mike

Re: facet question

Posted by Yonik Seeley <yo...@apache.org>.
On 5/31/07, Gal Nitzan <ga...@gmail.com> wrote:
> We have a small index with about 4 million docs.
>
> On this index we have a field "tags" which is a multiple values field.
>
> Running a facet query on the index with something like:
> facet=true&facetField=tags&q=type:video takes about 1 minute.
>
> We have defined a large cache which enables the query to run much faster
> (about 1 sec)
>
> <filterCache
>  class="solr.LRUCache"
>  size="1500000"
>  initialSize="600000"
>  autowarmCount="300000"/>
>
>
> However, the cache size brings us to the 2GB limit.

To reduce memory usage, you could try setting the facet.enum.cache.minDf
parameter to a low value (on a recent nightly build, soon 1.2).  If that
slows things down too much and your index is not optimized, then you
could try optimizing it.

-Yonik