You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Furkan Kuru <fu...@gmail.com> on 2010/06/04 16:31:18 UTC

Faceted Search Slows Down as index gets larger

Hello,

I have been dealing with real-time data.

As the number of total indexed documents gets larger (now 5 M)

a faceted search on a text field limited by the creation time, which we use
to find the most used word in all these text fields, gets slow down.


query string: created_time:[NOW-1HOUR TO NOW] facet.field=text
facet.mincount=1

the document count matching the query is around 9000.


It takes around 80 seconds in a decent computer with 4GB ram, quad core cpu

I do not know the internal details of term indexing and their counts for
faceting.

Any suggestion for speeding up this query is appreciated.

Thanks in advance.

-- 
Furkan Kuru

Re: Faceted Search Slows Down as index gets larger

Posted by Lance Norskog <go...@gmail.com>.
The Distributed Search feature assumes that a document only exists in
one code. Updating a doc in a small core will fail because it may be
found twice.

If you are only updating a popularity score, and only need it for
boosting (but not for searching on a value), there is a feature called
the ExternalFileField:

http://www.lucidimagination.com/search/document/CDRG_ch04_4.4.4?q=ExternalFileField
http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html

On Sun, Jun 6, 2010 at 10:26 PM, Andy <an...@yahoo.com> wrote:
> Yonik,
>
> Is there any documentation where I can read more about the big core + small core setup?
>
> One issue for me is that I don't just add new documents. Many of the changes is to update existing documents, such as updating the popularity score of the documents. Would the big core + small core strategy still work in this case? If not, is there any other way to mitigate the cache re-building problem of facet search?
>
> --- On Sun, 6/6/10, Yonik Seeley <yo...@lucidimagination.com> wrote:
>
>> From: Yonik Seeley <yo...@lucidimagination.com>
>> Subject: Re: Faceted Search Slows Down as index gets larger
>> To: solr-user@lucene.apache.org
>> Date: Sunday, June 6, 2010, 1:54 PM
>> On Sun, Jun 6, 2010 at 1:12 PM,
>> Furkan Kuru <fu...@gmail.com>
>> wrote:
>> > We try to provide real-time search. So the index is
>> changing almost in every
>> > minute.
>> >
>> > We commit for every 100 documents received.
>> >
>> > The facet search is executed every 5 mins.
>>
>> OK, that's the problem - pretty much every facet search is
>> rebuilding
>> the facet cache, which takes most of the time (and facet.fc
>> is more
>> expensive than facet.enum in this regard).
>>
>> One strategy is to use distributed search... have some big
>> cores that
>> don't change often, and then small cores for the new stuff
>> that
>> changes rapidly.
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>
>
>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Faceted Search Slows Down as index gets larger

Posted by Furkan Kuru <fu...@gmail.com>.
Ok, I will have a look at distributed search, multi-core solr solution.

Thank you Yonik,

On Sun, Jun 6, 2010 at 8:54 PM, Yonik Seeley <yo...@lucidimagination.com>wrote:

> On Sun, Jun 6, 2010 at 1:12 PM, Furkan Kuru <fu...@gmail.com> wrote:
> > We try to provide real-time search. So the index is changing almost in
> every
> > minute.
> >
> > We commit for every 100 documents received.
> >
> > The facet search is executed every 5 mins.
>
> OK, that's the problem - pretty much every facet search is rebuilding
> the facet cache, which takes most of the time (and facet.fc is more
> expensive than facet.enum in this regard).
>
> One strategy is to use distributed search... have some big cores that
> don't change often, and then small cores for the new stuff that
> changes rapidly.
>
> -Yonik
> http://www.lucidimagination.com
>



-- 
Furkan Kuru

Re: Faceted Search Slows Down as index gets larger

Posted by Yonik Seeley <yo...@lucidimagination.com>.
Another thing you can try is trunk.  This specific case has been
improved by an order of magnitude recenty.
The case that has been sped up is initial population of the
filterCache, or when the filterCache can't hold all of the unique
values, or when faceting is configured to not use the filterCache much
of the time via facet.enum.cache.minDf.

-Yonik
http://www.lucidimagination.com

On Thu, Dec 16, 2010 at 6:39 PM, Furkan Kuru <fu...@gmail.com> wrote:
> I am sorry for raising up this thread after 6 months.
>
> But we have still problems with faceted search on full-text fields.
>
> We try to get most frequent words in a text field that is created in 1 hour.
> The faceted search takes too much time even the matching number of documents
> (created_at within 1 HOUR) is constant (10-20K) as the total number of
> documents increases (now 20M) the query gets slower. Solr throws exceptions
> and does not respond. We have to restart and delete old docs. (3G RAM) Index
> is around 2.2 GB.
> And we store the data in solr as well. The documents are small.
>
> $response = $solr->search('created_at:[NOW-'.$hours.'HOUR TO NOW]', 0, 1,
> array( 'facet' => 'true', 'facet.field'=> $field, 'facet.mincount' => 1,
> 'facet.method' => 'enum', 'facet.enum.cache.minDf' => 100 ));
>
> Yonik had suggested distributed search. But I am not sure if we set every
> configuration correctly. For example the solr caches if they are related
> with faceted searching.
>
> We use default values:
>
> <filterCache
>       class="solr.FastLRUCache"
>       size="512"
>       initialSize="512"
>       autowarmCount="0"/>
>
>
> <queryResultCache
>       class="solr.LRUCache"
>       size="512"
>       initialSize="512"
>       autowarmCount="0"/>
>
>
>
> Any help is appreciated.
>
>
>
> On Sun, Jun 6, 2010 at 8:54 PM, Yonik Seeley <yo...@lucidimagination.com>
> wrote:
>>
>> On Sun, Jun 6, 2010 at 1:12 PM, Furkan Kuru <fu...@gmail.com> wrote:
>> > We try to provide real-time search. So the index is changing almost in
>> > every
>> > minute.
>> >
>> > We commit for every 100 documents received.
>> >
>> > The facet search is executed every 5 mins.
>>
>> OK, that's the problem - pretty much every facet search is rebuilding
>> the facet cache, which takes most of the time (and facet.fc is more
>> expensive than facet.enum in this regard).
>>
>> One strategy is to use distributed search... have some big cores that
>> don't change often, and then small cores for the new stuff that
>> changes rapidly.
>>
>> -Yonik
>> http://www.lucidimagination.com
>
>
>
> --
> Furkan Kuru
>

Re: Faceted Search Slows Down as index gets larger

Posted by Furkan Kuru <fu...@gmail.com>.
I am sorry for raising up this thread after 6 months.

But we have still problems with faceted search on full-text fields.

We try to get most frequent words in a text field that is created in 1 hour.
The faceted search takes too much time even the matching number of documents
(created_at within 1 HOUR) is constant (10-20K) as the total number of
documents increases (now 20M) the query gets slower. Solr throws exceptions
and does not respond. We have to restart and delete old docs. (3G RAM) Index
is around 2.2 GB.
And we store the data in solr as well. The documents are small.

$response = $solr->search('created_at:[NOW-'.$hours.'HOUR TO NOW]', 0, 1,
array( 'facet' => 'true', 'facet.field'=> $field, 'facet.mincount' => 1,
'facet.method' => 'enum', 'facet.enum.cache.minDf' => 100 ));

Yonik had suggested distributed search. But I am not sure if we set every
configuration correctly. For example the solr caches if they are related
with faceted searching.

We use default values:

<filterCache
      class="solr.FastLRUCache"
      size="512"
      initialSize="512"
      autowarmCount="0"/>


<queryResultCache
      class="solr.LRUCache"
      size="512"
      initialSize="512"
      autowarmCount="0"/>



Any help is appreciated.



On Sun, Jun 6, 2010 at 8:54 PM, Yonik Seeley <yo...@lucidimagination.com>wrote:

> On Sun, Jun 6, 2010 at 1:12 PM, Furkan Kuru <fu...@gmail.com> wrote:
> > We try to provide real-time search. So the index is changing almost in
> every
> > minute.
> >
> > We commit for every 100 documents received.
> >
> > The facet search is executed every 5 mins.
>
> OK, that's the problem - pretty much every facet search is rebuilding
> the facet cache, which takes most of the time (and facet.fc is more
> expensive than facet.enum in this regard).
>
> One strategy is to use distributed search... have some big cores that
> don't change often, and then small cores for the new stuff that
> changes rapidly.
>
> -Yonik
> http://www.lucidimagination.com
>



-- 
Furkan Kuru

Re: Faceted Search Slows Down as index gets larger

Posted by John Wang <jo...@gmail.com>.
Using the Zoie/Bobo combination gives you realtime faceting. (Lucene based)

http://sna-projects.com/zoie/
http://sna-projects.com/bobo/

wiki write-up:
http://snaprojects.jira.com/wiki/display/BOBO/Realtime+Faceting+with+Zoie

We can take this over to the zoie/bobo mailing list if you have questions.
We are doing realtime faceting in production here at LinkedIn serving tens
of millions of queries a day with over 70 million user profiles in the
index.

-John

On Sun, Jun 6, 2010 at 10:54 AM, Yonik Seeley <yo...@lucidimagination.com>wrote:

> On Sun, Jun 6, 2010 at 1:12 PM, Furkan Kuru <fu...@gmail.com> wrote:
> > We try to provide real-time search. So the index is changing almost in
> every
> > minute.
> >
> > We commit for every 100 documents received.
> >
> > The facet search is executed every 5 mins.
>
> OK, that's the problem - pretty much every facet search is rebuilding
> the facet cache, which takes most of the time (and facet.fc is more
> expensive than facet.enum in this regard).
>
> One strategy is to use distributed search... have some big cores that
> don't change often, and then small cores for the new stuff that
> changes rapidly.
>
> -Yonik
> http://www.lucidimagination.com
>

Re: Faceted Search Slows Down as index gets larger

Posted by Andy <an...@yahoo.com>.
Yonik,

Is there any documentation where I can read more about the big core + small core setup?

One issue for me is that I don't just add new documents. Many of the changes is to update existing documents, such as updating the popularity score of the documents. Would the big core + small core strategy still work in this case? If not, is there any other way to mitigate the cache re-building problem of facet search?

--- On Sun, 6/6/10, Yonik Seeley <yo...@lucidimagination.com> wrote:

> From: Yonik Seeley <yo...@lucidimagination.com>
> Subject: Re: Faceted Search Slows Down as index gets larger
> To: solr-user@lucene.apache.org
> Date: Sunday, June 6, 2010, 1:54 PM
> On Sun, Jun 6, 2010 at 1:12 PM,
> Furkan Kuru <fu...@gmail.com>
> wrote:
> > We try to provide real-time search. So the index is
> changing almost in every
> > minute.
> >
> > We commit for every 100 documents received.
> >
> > The facet search is executed every 5 mins.
> 
> OK, that's the problem - pretty much every facet search is
> rebuilding
> the facet cache, which takes most of the time (and facet.fc
> is more
> expensive than facet.enum in this regard).
> 
> One strategy is to use distributed search... have some big
> cores that
> don't change often, and then small cores for the new stuff
> that
> changes rapidly.
> 
> -Yonik
> http://www.lucidimagination.com
> 


      

Re: Faceted Search Slows Down as index gets larger

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Sun, Jun 6, 2010 at 1:12 PM, Furkan Kuru <fu...@gmail.com> wrote:
> We try to provide real-time search. So the index is changing almost in every
> minute.
>
> We commit for every 100 documents received.
>
> The facet search is executed every 5 mins.

OK, that's the problem - pretty much every facet search is rebuilding
the facet cache, which takes most of the time (and facet.fc is more
expensive than facet.enum in this regard).

One strategy is to use distributed search... have some big cores that
don't change often, and then small cores for the new stuff that
changes rapidly.

-Yonik
http://www.lucidimagination.com

Re: Faceted Search Slows Down as index gets larger

Posted by Furkan Kuru <fu...@gmail.com>.
We try to provide real-time search. So the index is changing almost in every
minute.

We commit for every 100 documents received.

The facet search is executed every 5 mins.

Here is the stats result after facet search with normal facet.method=fc (it
took 95 seconds)

*name: * fieldValueCache   *class: * org.apache.solr.search.FastLRUCache   *
version: * 1.0   *description: * Concurrent LRU Cache(maxSize=10000,
initialSize=10, minSize=9000, acceptableSize=9500, cleanupThread=false)   *
stats: * lookups : 0
hits : 0
hitratio : 0.00
inserts : 0
evictions : 0
size : 0
warmupTime : 0
cumulative_lookups : 34905
cumulative_hits : 2109
cumulative_hitratio : 0.06
cumulative_inserts : 16396
cumulative_evictions : 0


 *name: * filterCache   *class: * org.apache.solr.search.FastLRUCache   *
version: * 1.0   *description: * Concurrent LRU Cache(maxSize=512,
initialSize=512, minSize=460, acceptableSize=486, cleanupThread=false)   *
stats: * lookups : 0
hits : 0
hitratio : 0.00
inserts : 3
evictions : 0
size : 3
warmupTime : 0
cumulative_lookups : 24533601
cumulative_hits : 149859
cumulative_hitratio : 0.00
cumulative_inserts : 24501766
cumulative_evictions : 24036089


On Sun, Jun 6, 2010 at 3:27 PM, Yonik Seeley <yo...@lucidimagination.com>wrote:

> On Sun, Jun 6, 2010 at 7:38 AM, Furkan Kuru <fu...@gmail.com> wrote:
> > facet.limit = default value 100
> > facet.minCount is 1
> >
> > The document count that matches the query is 8-10K in average. I did not
> > calculate the terms (maybe using using facet.limit=-1 and
> facet.minCount=1)
> >
> > My index entirely fits into memory.
>
> How often is the index changing (how often are you committing).
> It takes time to build the UnInvertedField structure for the first
> facet request after the index changes.
>
> Also, with the normal facet.method=fc, after you run it, go to the
> statistics page and look for the whole entry for fieldValueCache (and
> cut'n'paste it here).
>
> -Yonik
> http://www.lucidimagination.com
>



-- 
Furkan Kuru

Re: Faceted Search Slows Down as index gets larger

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Sun, Jun 6, 2010 at 7:38 AM, Furkan Kuru <fu...@gmail.com> wrote:
> facet.limit = default value 100
> facet.minCount is 1
>
> The document count that matches the query is 8-10K in average. I did not
> calculate the terms (maybe using using facet.limit=-1 and facet.minCount=1)
>
> My index entirely fits into memory.

How often is the index changing (how often are you committing).
It takes time to build the UnInvertedField structure for the first
facet request after the index changes.

Also, with the normal facet.method=fc, after you run it, go to the
statistics page and look for the whole entry for fieldValueCache (and
cut'n'paste it here).

-Yonik
http://www.lucidimagination.com

Re: Faceted Search Slows Down as index gets larger

Posted by Furkan Kuru <fu...@gmail.com>.
facet.limit = default value 100
facet.minCount is 1

The document count that matches the query is 8-10K in average. I did not
calculate the terms (maybe using using facet.limit=-1 and facet.minCount=1)

My index entirely fits into memory.



On Sun, Jun 6, 2010 at 5:10 AM, Andy <an...@yahoo.com> wrote:

> This is strange.
>
> 1M unique facet terms and 10 terms per document -- sounds like this use
> case is exactly where fc would be faster. But your results  were the exact
> opposite.
>
> What value for facet.limit did you set?
>
> Was your 80/30 seconds query time spent mostly on returning the facet
> counts of all 1M of facet terms, or did you limit the number of facet terms
> returned to a small number?
>
> Also did your entire index fit within RAM?
>
>
> --- On Sat, 6/5/10, Furkan Kuru <fu...@gmail.com> wrote:
>
> > From: Furkan Kuru <fu...@gmail.com>
> > Subject: Re: Faceted Search Slows Down as index gets larger
> > To: solr-user@lucene.apache.org, yonik@lucidimagination.com
> > Date: Saturday, June 5, 2010, 8:40 AM
> > The documents full-text fields are
> > 140 chars length (tweets).
> >
> > Actually I had looked at those parameters and thought no
> > change was
> > neccessary because the terms per document would be few and
> > the unique term
> > count was nearly 1 M. I don't know exactly but average term
> > count per
> > document text can be 10 in my case.
> >
> > I think I still do not get why facet.method=enum is
> > faster.
> >
> >
> > On Sat, Jun 5, 2010 at 5:00 AM, Yonik Seeley <yonik@lucidimagination.com
> >wrote:
> >
> > > On Fri, Jun 4, 2010 at 7:33 PM, Andy <an...@yahoo.com>
> > wrote:
> > > > Yonik,
> > > >
> > > > Just curious why does using enum improve the
> > facet performance.
> > > >
> > > > Furkan was faceting on a text field with each
> > word being a facet value.
> > > I'd imagine that'd mean there's a large number of
> > facet values. According to
> > > the documentation (
> > > http://wiki.apache.org/solr/SimpleFacetParameters#facet.method)
> > > facet.method=fc is faster when a field has many unique
> > terms. So how come
> > > enum, not fc, is faster in this case?
> > >
> > > facet.method=fc is faster when there are many unique
> > terms, and
> > > relatively few terms per document.  A full-text
> > field doesn't fit that
> > > bill.
> > >
> > > > Also why use filterCache less?
> > >
> > > Take sup a lot of memory.
> > >
> > > -Yonik
> > > http://www.lucidimagination.com
> > >
> >
> >
> >
> > --
> > Furkan Kuru
> >
>
>
>
>


-- 
Furkan Kuru

Re: Faceted Search Slows Down as index gets larger

Posted by Andy <an...@yahoo.com>.
This is strange.

1M unique facet terms and 10 terms per document -- sounds like this use case is exactly where fc would be faster. But your results  were the exact opposite.

What value for facet.limit did you set?

Was your 80/30 seconds query time spent mostly on returning the facet counts of all 1M of facet terms, or did you limit the number of facet terms returned to a small number?

Also did your entire index fit within RAM?


--- On Sat, 6/5/10, Furkan Kuru <fu...@gmail.com> wrote:

> From: Furkan Kuru <fu...@gmail.com>
> Subject: Re: Faceted Search Slows Down as index gets larger
> To: solr-user@lucene.apache.org, yonik@lucidimagination.com
> Date: Saturday, June 5, 2010, 8:40 AM
> The documents full-text fields are
> 140 chars length (tweets).
> 
> Actually I had looked at those parameters and thought no
> change was
> neccessary because the terms per document would be few and
> the unique term
> count was nearly 1 M. I don't know exactly but average term
> count per
> document text can be 10 in my case.
> 
> I think I still do not get why facet.method=enum is
> faster.
> 
> 
> On Sat, Jun 5, 2010 at 5:00 AM, Yonik Seeley <yo...@lucidimagination.com>wrote:
> 
> > On Fri, Jun 4, 2010 at 7:33 PM, Andy <an...@yahoo.com>
> wrote:
> > > Yonik,
> > >
> > > Just curious why does using enum improve the
> facet performance.
> > >
> > > Furkan was faceting on a text field with each
> word being a facet value.
> > I'd imagine that'd mean there's a large number of
> facet values. According to
> > the documentation (
> > http://wiki.apache.org/solr/SimpleFacetParameters#facet.method)
> > facet.method=fc is faster when a field has many unique
> terms. So how come
> > enum, not fc, is faster in this case?
> >
> > facet.method=fc is faster when there are many unique
> terms, and
> > relatively few terms per document.  A full-text
> field doesn't fit that
> > bill.
> >
> > > Also why use filterCache less?
> >
> > Take sup a lot of memory.
> >
> > -Yonik
> > http://www.lucidimagination.com
> >
> 
> 
> 
> -- 
> Furkan Kuru
> 


      

Re: Faceted Search Slows Down as index gets larger

Posted by Furkan Kuru <fu...@gmail.com>.
The documents full-text fields are 140 chars length (tweets).

Actually I had looked at those parameters and thought no change was
neccessary because the terms per document would be few and the unique term
count was nearly 1 M. I don't know exactly but average term count per
document text can be 10 in my case.

I think I still do not get why facet.method=enum is faster.


On Sat, Jun 5, 2010 at 5:00 AM, Yonik Seeley <yo...@lucidimagination.com>wrote:

> On Fri, Jun 4, 2010 at 7:33 PM, Andy <an...@yahoo.com> wrote:
> > Yonik,
> >
> > Just curious why does using enum improve the facet performance.
> >
> > Furkan was faceting on a text field with each word being a facet value.
> I'd imagine that'd mean there's a large number of facet values. According to
> the documentation (
> http://wiki.apache.org/solr/SimpleFacetParameters#facet.method)
> facet.method=fc is faster when a field has many unique terms. So how come
> enum, not fc, is faster in this case?
>
> facet.method=fc is faster when there are many unique terms, and
> relatively few terms per document.  A full-text field doesn't fit that
> bill.
>
> > Also why use filterCache less?
>
> Take sup a lot of memory.
>
> -Yonik
> http://www.lucidimagination.com
>



-- 
Furkan Kuru

Re: Faceted Search Slows Down as index gets larger

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Fri, Jun 4, 2010 at 7:33 PM, Andy <an...@yahoo.com> wrote:
> Yonik,
>
> Just curious why does using enum improve the facet performance.
>
> Furkan was faceting on a text field with each word being a facet value. I'd imagine that'd mean there's a large number of facet values. According to the documentation (http://wiki.apache.org/solr/SimpleFacetParameters#facet.method) facet.method=fc is faster when a field has many unique terms. So how come enum, not fc, is faster in this case?

facet.method=fc is faster when there are many unique terms, and
relatively few terms per document.  A full-text field doesn't fit that
bill.

> Also why use filterCache less?

Take sup a lot of memory.

-Yonik
http://www.lucidimagination.com

Re: Faceted Search Slows Down as index gets larger

Posted by Andy <an...@yahoo.com>.
Yonik,

Just curious why does using enum improve the facet performance. 

Furkan was faceting on a text field with each word being a facet value. I'd imagine that'd mean there's a large number of facet values. According to the documentation (http://wiki.apache.org/solr/SimpleFacetParameters#facet.method) facet.method=fc is faster when a field has many unique terms. So how come enum, not fc, is faster in this case?

Also why use filterCache less?

Thanks
Andy

--- On Fri, 6/4/10, Furkan Kuru <fu...@gmail.com> wrote:

> From: Furkan Kuru <fu...@gmail.com>
> Subject: Re: Faceted Search Slows Down as index gets larger
> To: solr-user@lucene.apache.org, yonik@lucidimagination.com
> Date: Friday, June 4, 2010, 11:25 AM
> I am using 1.4 version.
> 
> I have tried your suggestion,
> 
> it takes around 25-30 seconds now.
> 
> Thank you,
> 
> 
> On Fri, Jun 4, 2010 at 5:54 PM, Yonik Seeley <yo...@lucidimagination.com>wrote:
> 
> > Faceting on a full-text field is hard.
> > What version of Solr are you using?
> >
> > If it's 1.4 or later, try setting
> > facet.method=enum
> >
> > And to use the filterCache less, try
> > facet.enum.cache.minDf=100
> >
> > -Yonik
> > http://www.lucidimagination.com
> >
> > On Fri, Jun 4, 2010 at 10:31 AM, Furkan Kuru <fu...@gmail.com>
> wrote:
> > > Hello,
> > >
> > > I have been dealing with real-time data.
> > >
> > > As the number of total indexed documents gets
> larger (now 5 M)
> > >
> > > a faceted search on a text field limited by the
> creation time, which we
> > use
> > > to find the most used word in all these text
> fields, gets slow down.
> > >
> > >
> > > query string: created_time:[NOW-1HOUR TO NOW]
> facet.field=text
> > > facet.mincount=1
> > >
> > > the document count matching the query is around
> 9000.
> > >
> > >
> > > It takes around 80 seconds in a decent computer
> with 4GB ram, quad core
> > cpu
> > >
> > > I do not know the internal details of term
> indexing and their counts for
> > > faceting.
> > >
> > > Any suggestion for speeding up this query is
> appreciated.
> > >
> > > Thanks in advance.
> > >
> > > --
> > > Furkan Kuru
> > >
> >
> 
> 
> 
> -- 
> Furkan Kuru
> 


      

Re: Faceted Search Slows Down as index gets larger

Posted by Furkan Kuru <fu...@gmail.com>.
I am using 1.4 version.

I have tried your suggestion,

it takes around 25-30 seconds now.

Thank you,


On Fri, Jun 4, 2010 at 5:54 PM, Yonik Seeley <yo...@lucidimagination.com>wrote:

> Faceting on a full-text field is hard.
> What version of Solr are you using?
>
> If it's 1.4 or later, try setting
> facet.method=enum
>
> And to use the filterCache less, try
> facet.enum.cache.minDf=100
>
> -Yonik
> http://www.lucidimagination.com
>
> On Fri, Jun 4, 2010 at 10:31 AM, Furkan Kuru <fu...@gmail.com> wrote:
> > Hello,
> >
> > I have been dealing with real-time data.
> >
> > As the number of total indexed documents gets larger (now 5 M)
> >
> > a faceted search on a text field limited by the creation time, which we
> use
> > to find the most used word in all these text fields, gets slow down.
> >
> >
> > query string: created_time:[NOW-1HOUR TO NOW] facet.field=text
> > facet.mincount=1
> >
> > the document count matching the query is around 9000.
> >
> >
> > It takes around 80 seconds in a decent computer with 4GB ram, quad core
> cpu
> >
> > I do not know the internal details of term indexing and their counts for
> > faceting.
> >
> > Any suggestion for speeding up this query is appreciated.
> >
> > Thanks in advance.
> >
> > --
> > Furkan Kuru
> >
>



-- 
Furkan Kuru

Re: Faceted Search Slows Down as index gets larger

Posted by Yonik Seeley <yo...@lucidimagination.com>.
Faceting on a full-text field is hard.
What version of Solr are you using?

If it's 1.4 or later, try setting
facet.method=enum

And to use the filterCache less, try
facet.enum.cache.minDf=100

-Yonik
http://www.lucidimagination.com

On Fri, Jun 4, 2010 at 10:31 AM, Furkan Kuru <fu...@gmail.com> wrote:
> Hello,
>
> I have been dealing with real-time data.
>
> As the number of total indexed documents gets larger (now 5 M)
>
> a faceted search on a text field limited by the creation time, which we use
> to find the most used word in all these text fields, gets slow down.
>
>
> query string: created_time:[NOW-1HOUR TO NOW] facet.field=text
> facet.mincount=1
>
> the document count matching the query is around 9000.
>
>
> It takes around 80 seconds in a decent computer with 4GB ram, quad core cpu
>
> I do not know the internal details of term indexing and their counts for
> faceting.
>
> Any suggestion for speeding up this query is appreciated.
>
> Thanks in advance.
>
> --
> Furkan Kuru
>