You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Galen Pahlke <pa...@gmail.com> on 2008/07/11 01:55:55 UTC

Search slow on a field with many unique values (date)

Hi all,
I have an index with 40 million small records with about 10 fields each.
As my index size grows, I've noticed that queries involving the date field (
range queries, order by, etc) are taking a disproportionately long time.
Could this perhaps be because a date field has so many possible unique
values?  I don't know how to find out exactly, but I'd guess there are
at least a few million unique dates in the index.  Would increasing the
granularity of the date field so that there are less unique values likely
increase
search speed?  Any other suggestions?

Thanks,
- Galen

Re: Search slow on a field with many unique values (date)

Posted by Norberto Meijome <fr...@meijome.net>.
On Thu, 10 Jul 2008 17:55:55 -0600
"Galen Pahlke" <pa...@gmail.com> wrote:

> Could this perhaps be because a date field has so many possible unique
> values?  I don't know how to find out exactly, but I'd guess there are
> at least a few million unique dates in the index.  Would increasing the
> granularity of the date field so that there are less unique values likely
> increase
> search speed?  

Surely you mean decreasing the granularity... ie, from dates which include
milliseconds , to something like, say, minute, or even seconds, can make a big
difference.

alternatively, what I've done on date fields that need fast searches, is to
convert them at data loading time into a numeric field of the form YYYYMMDD or
YYYYMMDDHHMM . it won't allow you to directly query for NOW - 4 , but you can
do the calculation when preparing the search and obtain the same result.

B

_________________________
{Beto|Norberto|Numard} Meijome

"He has no enemies, but is intensely disliked by his friends."
  Oscar Wilde

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.

Re: Search slow on a field with many unique values (date)

Posted by Mike Klaas <mi...@gmail.com>.
On 10-Jul-08, at 4:55 PM, Galen Pahlke wrote:

> Hi all,
> I have an index with 40 million small records with about 10 fields  
> each.
> As my index size grows, I've noticed that queries involving the date  
> field (
> range queries, order by, etc) are taking a disproportionately long  
> time.
> Could this perhaps be because a date field has so many possible unique
> values?  I don't know how to find out exactly, but I'd guess there are
> at least a few million unique dates in the index.  Would increasing  
> the
> granularity of the date field so that there are less unique values  
> likely
> increase
> search speed?  Any other suggestions?

Yep, "increasing" (I'd be tempted to say "decreasing") the granularity  
here can make a huge difference,  both in terms of speed and memory  
consumption (especially for range queries).

It would also be wise to ensure that queries of these types are in  
your search warmup.

-Mike