You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Michael Lackhoff <mi...@lackhoff.de> on 2008/11/01 06:07:24 UTC

Re: date range query performance

On 31.10.2008 19:16 Chris Hostetter wrote:

> forteh record, you don't need to index as a "StrField" to get this 
> benefit, you can still index using DateField you just need to round your 
> dates to some less graunlar level .. if you always want to round down, you 
> don't even need to do the rounding yourself, just add "/SECOND" 
> or "/MINUTE" or "/HOUR" to each of your dates before sending them to solr.  
> (SOLR-741 proposes adding a config option to DateField to let this be done 
> server side)

Is this also possible for the timestamp that is automatically added to
all new/updated docs? I would like to be able to search (quickly) for
everything that was added within the last week or month or whatever. And
because I update the index only once a day a granuality of /DAY (if that
exists) would be fine.

- Michael

Re: date range query performance

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Wed, Jan 7, 2009 at 7:47 AM, Jim Adams <ja...@gmail.com> wrote:

> Can someone explain what this means to me?
>
> I'm having a similar performance issue - it's an index with only 1 million
> records or so, but when trying to search on a date range it takes 30
> seconds!  Yes, this date is one with hours, minutes, seconds in them -- do
> I
> need to create an additional field without the time component and reindex
> all my documents so I can get decent search performance?  Or can I tell
> Solr
> "Please ignore the time and do something in a reasonable timeframe" (GRIN)
>
>
Range queries are slow if you have a large number of unique terms. With
dates it is especially a problem because the more precise they are, the more
number of terms you've got in that field.

The easy solution is to round off your dates to minimum precision acceptable
to your use-case. You'll need to re-index.

-- 
Regards,
Shalin Shekhar Mangar.

Re: date range query performance

Posted by Erick Erickson <er...@gmail.com>.
You'll have to search the archives for a more complete explanation, I'm
going from memory here.. (or perhaps it's on the Wiki, I don't remember).

The notion is to break apart your timestamp (if you really, really need the
precision) into several fields rather than one. I.e. index the YYYYMMDD
as one field, then perhaps HHSS a second field and, perhaps, milliseconds
as a third field. This *greatly* reduces the number of unique terms and
should
improve searching on ranges, not to mention sorting. You'll have to
manipulate the timestamp part of the query.

There are variations on the scheme, you could have 6 fields for instance,
YYYY,
MM, DD, HH, SS, MS for instance. Or....

But the very best solution is to do as Erik (no relation) H. suggests, just
reindex
with, say, day granularity if that's fine enough.

Best
Erick

On Wed, Jan 7, 2009 at 6:03 AM, Erik Hatcher <er...@ehatchersolutions.com>wrote:

>
> On Jan 6, 2009, at 9:17 PM, Jim Adams wrote:
>
>> Can someone explain what this means to me?
>>
>
> The below <field> definition sets the timestamp field without time
> granularity, just day.  It's the difference between, say you've indexed a
> document for every millisecond in a day (what is that, 86.4M?), and a single
> term for the single date.
>
>  I'm having a similar performance issue - it's an index with only 1 million
>> records or so, but when trying to search on a date range it takes 30
>> seconds!  Yes, this date is one with hours, minutes, seconds in them -- do
>> I
>> need to create an additional field without the time component and reindex
>> all my documents so I can get decent search performance?  Or can I tell
>> Solr
>> "Please ignore the time and do something in a reasonable timeframe" (GRIN)
>>
>
> Do you care about milliseconds, seconds, minutes, or hours in terms of
> searching?  If not, it's a very good idea to reduce the granularity and thus
> the number of unique terms.
>
>        Erik
>
>
>
>
>>
>> Thanks.
>>
>> On Fri, Oct 31, 2008 at 10:28 PM, Michael Lackhoff <michael@lackhoff.de
>> >wrote:
>>
>>  On 01.11.2008 06:10 Erik Hatcher wrote:
>>>
>>>  Yeah, this should work fine:
>>>>
>>>>   <field name="timestamp" type="date" indexed="true" stored="true"
>>>> default="NOW/DAY" multiValued="false"/>
>>>>
>>>
>>> Wow, that was fast, thanks!
>>>
>>> -Michael
>>>
>>>
>

Re: date range query performance

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Jan 6, 2009, at 9:17 PM, Jim Adams wrote:
> Can someone explain what this means to me?

The below <field> definition sets the timestamp field without time  
granularity, just day.  It's the difference between, say you've  
indexed a document for every millisecond in a day (what is that,  
86.4M?), and a single term for the single date.

> I'm having a similar performance issue - it's an index with only 1  
> million
> records or so, but when trying to search on a date range it takes 30
> seconds!  Yes, this date is one with hours, minutes, seconds in them  
> -- do I
> need to create an additional field without the time component and  
> reindex
> all my documents so I can get decent search performance?  Or can I  
> tell Solr
> "Please ignore the time and do something in a reasonable  
> timeframe" (GRIN)

Do you care about milliseconds, seconds, minutes, or hours in terms of  
searching?  If not, it's a very good idea to reduce the granularity  
and thus the number of unique terms.

	Erik


>
>
> Thanks.
>
> On Fri, Oct 31, 2008 at 10:28 PM, Michael Lackhoff <michael@lackhoff.de 
> >wrote:
>
>> On 01.11.2008 06:10 Erik Hatcher wrote:
>>
>>> Yeah, this should work fine:
>>>
>>>    <field name="timestamp" type="date" indexed="true" stored="true"
>>> default="NOW/DAY" multiValued="false"/>
>>
>> Wow, that was fast, thanks!
>>
>> -Michael
>>


Re: date range query performance

Posted by Jim Adams <ja...@gmail.com>.
Can someone explain what this means to me?

I'm having a similar performance issue - it's an index with only 1 million
records or so, but when trying to search on a date range it takes 30
seconds!  Yes, this date is one with hours, minutes, seconds in them -- do I
need to create an additional field without the time component and reindex
all my documents so I can get decent search performance?  Or can I tell Solr
"Please ignore the time and do something in a reasonable timeframe" (GRIN)

Thanks.

On Fri, Oct 31, 2008 at 10:28 PM, Michael Lackhoff <mi...@lackhoff.de>wrote:

> On 01.11.2008 06:10 Erik Hatcher wrote:
>
> > Yeah, this should work fine:
> >
> >     <field name="timestamp" type="date" indexed="true" stored="true"
> > default="NOW/DAY" multiValued="false"/>
>
> Wow, that was fast, thanks!
>
> -Michael
>

Re: date range query performance

Posted by Michael Lackhoff <mi...@lackhoff.de>.
On 01.11.2008 06:10 Erik Hatcher wrote:

> Yeah, this should work fine:
> 
>     <field name="timestamp" type="date" indexed="true" stored="true"  
> default="NOW/DAY" multiValued="false"/>

Wow, that was fast, thanks!

-Michael

Re: date range query performance

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Nov 1, 2008, at 1:07 AM, Michael Lackhoff wrote:

> On 31.10.2008 19:16 Chris Hostetter wrote:
>
>> forteh record, you don't need to index as a "StrField" to get this
>> benefit, you can still index using DateField you just need to round  
>> your
>> dates to some less graunlar level .. if you always want to round  
>> down, you
>> don't even need to do the rounding yourself, just add "/SECOND"
>> or "/MINUTE" or "/HOUR" to each of your dates before sending them  
>> to solr.
>> (SOLR-741 proposes adding a config option to DateField to let this  
>> be done
>> server side)
>
> Is this also possible for the timestamp that is automatically added to
> all new/updated docs? I would like to be able to search (quickly) for
> everything that was added within the last week or month or whatever.  
> And
> because I update the index only once a day a granuality of /DAY (if  
> that
> exists) would be fine.

Yeah, this should work fine:

    <field name="timestamp" type="date" indexed="true" stored="true"  
default="NOW/DAY" multiValued="false"/>

	Erik