You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Dmitry Kan <so...@gmail.com> on 2014/05/09 12:41:41 UTC

date range queries efficiency

Hi,

There was a mention either on solr wiki or on this list, that in order to
optimize the date range queries, it is beneficial to round down the range
values.

For example, if a range query is:

DateTime:[NOW-3DAYS TO NOW]

then if the precision up to msec is not required, we can safely round that
down to a day or hour, for example:

DateTime:[NOW-3DAYS/DAY TO NOW/DAY]
DateTime:[NOW-3DAYS/HOUR TO NOW/HOUR]

What I'm wondering about is what other optimizations would make sense here
on the indexing side? Luke shows that solr stores dates as longs with
millisecond precision. So this seems to utilize the efficient Lucene
numeric range queries internally.

If we do not need msec precision on dates during search, does it make sense
to also "round" dates down during indexing? Are there any other tips and
tricks for efficient date range queries?

Thanks!

-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan

Re: date range queries efficiency

Posted by Jack Krupansky <ja...@basetechnology.com>.
My e-book has an example of an update processor that rounds to any specified 
resolution (e.g, day, year, hour, etc.)

The performance reason was for filter queries, to keep their uniqueness 
down, not random user queries, which should be fine unrounded, except that 
they can't be used for exact query matches such as year without expanding 
the date to a range for the full interval.

-- Jack Krupansky

-----Original Message----- 
From: Dmitry Kan
Sent: Friday, May 9, 2014 6:41 AM
To: solr-user@lucene.apache.org
Subject: date range queries efficiency

Hi,

There was a mention either on solr wiki or on this list, that in order to
optimize the date range queries, it is beneficial to round down the range
values.

For example, if a range query is:

DateTime:[NOW-3DAYS TO NOW]

then if the precision up to msec is not required, we can safely round that
down to a day or hour, for example:

DateTime:[NOW-3DAYS/DAY TO NOW/DAY]
DateTime:[NOW-3DAYS/HOUR TO NOW/HOUR]

What I'm wondering about is what other optimizations would make sense here
on the indexing side? Luke shows that solr stores dates as longs with
millisecond precision. So this seems to utilize the efficient Lucene
numeric range queries internally.

If we do not need msec precision on dates during search, does it make sense
to also "round" dates down during indexing? Are there any other tips and
tricks for efficient date range queries?

Thanks!

-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan 


Re: date range queries efficiency

Posted by Dmitry Kan <so...@gmail.com>.
Thanks, Erick!


On Tue, May 20, 2014 at 3:55 AM, Erick Erickson <er...@gmail.com>wrote:

> This might be useful:
> http://searchhub.org/2012/02/23/date-math-now-and-filter-queries/
>
> Best,
> Erick
>
> On Mon, May 19, 2014 at 12:09 AM, Dmitry Kan <so...@gmail.com> wrote:
> > Thanks, Jack, Alex and Shawn.
> >
> > This makes proper sense. One win of rounding down on indexing side is
> > saving index space, according to hoss (reply over IRC):
> >
> > "with the TrieDateFields, rounding dates at indexing time won't have any
> > effect on the cachability of the rounded queries, and even for non cached
> > queries it shouldn't affect the performance much -- but yes, it would
> help
> > reduce index size"
> >
> > I haven't tried it myself, just thought to ask if somebody tried it
> already.
> >
> > Dmitry
> >
> >
> > On Sat, May 17, 2014 at 8:37 AM, Shawn Heisey <so...@elyograg.org> wrote:
> >
> >> On 5/15/2014 1:34 AM, Alexandre Rafalovitch wrote:
> >> > I thought the date math rounding was for _caching_ the repeated
> >> > queries, not so much the speed of the query itself.
> >>
> >> Absolutely correct.  When NOW is used without rounding, caching is
> >> completely ineffective.  This is because if the same query using NOW is
> >> sent multiple times several seconds apart, every one of those queries
> >> will be different after they are parsed and NOW is converted to an
> >> actual timestamp.
> >>
> >> > Also, if you are using TrieDateField, precisionStep value is how
> >> > optimization is done. There is bucketing at different level of
> >> > precision, so the range search works at the least granular level
> >> > first, etc.
> >>
> >> Some nitty-gritty details of how range queries are accelerated with the
> >> Trie data types and precisionStep are described in the Javadoc for
> >> NumericRangeQuery:
> >>
> >>
> >>
> http://lucene.apache.org/core/4_8_0/core/org/apache/lucene/search/NumericRangeQuery.html
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
> >
> >
> > --
> > Dmitry Kan
> > Blog: http://dmitrykan.blogspot.com
> > Twitter: http://twitter.com/dmitrykan
>



-- 
Dmitry Kan
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan

Re: date range queries efficiency

Posted by Erick Erickson <er...@gmail.com>.
This might be useful:
http://searchhub.org/2012/02/23/date-math-now-and-filter-queries/

Best,
Erick

On Mon, May 19, 2014 at 12:09 AM, Dmitry Kan <so...@gmail.com> wrote:
> Thanks, Jack, Alex and Shawn.
>
> This makes proper sense. One win of rounding down on indexing side is
> saving index space, according to hoss (reply over IRC):
>
> "with the TrieDateFields, rounding dates at indexing time won't have any
> effect on the cachability of the rounded queries, and even for non cached
> queries it shouldn't affect the performance much -- but yes, it would help
> reduce index size"
>
> I haven't tried it myself, just thought to ask if somebody tried it already.
>
> Dmitry
>
>
> On Sat, May 17, 2014 at 8:37 AM, Shawn Heisey <so...@elyograg.org> wrote:
>
>> On 5/15/2014 1:34 AM, Alexandre Rafalovitch wrote:
>> > I thought the date math rounding was for _caching_ the repeated
>> > queries, not so much the speed of the query itself.
>>
>> Absolutely correct.  When NOW is used without rounding, caching is
>> completely ineffective.  This is because if the same query using NOW is
>> sent multiple times several seconds apart, every one of those queries
>> will be different after they are parsed and NOW is converted to an
>> actual timestamp.
>>
>> > Also, if you are using TrieDateField, precisionStep value is how
>> > optimization is done. There is bucketing at different level of
>> > precision, so the range search works at the least granular level
>> > first, etc.
>>
>> Some nitty-gritty details of how range queries are accelerated with the
>> Trie data types and precisionStep are described in the Javadoc for
>> NumericRangeQuery:
>>
>>
>> http://lucene.apache.org/core/4_8_0/core/org/apache/lucene/search/NumericRangeQuery.html
>>
>> Thanks,
>> Shawn
>>
>>
>
>
> --
> Dmitry Kan
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan

Re: date range queries efficiency

Posted by Dmitry Kan <so...@gmail.com>.
Thanks, Jack, Alex and Shawn.

This makes proper sense. One win of rounding down on indexing side is
saving index space, according to hoss (reply over IRC):

"with the TrieDateFields, rounding dates at indexing time won't have any
effect on the cachability of the rounded queries, and even for non cached
queries it shouldn't affect the performance much -- but yes, it would help
reduce index size"

I haven't tried it myself, just thought to ask if somebody tried it already.

Dmitry


On Sat, May 17, 2014 at 8:37 AM, Shawn Heisey <so...@elyograg.org> wrote:

> On 5/15/2014 1:34 AM, Alexandre Rafalovitch wrote:
> > I thought the date math rounding was for _caching_ the repeated
> > queries, not so much the speed of the query itself.
>
> Absolutely correct.  When NOW is used without rounding, caching is
> completely ineffective.  This is because if the same query using NOW is
> sent multiple times several seconds apart, every one of those queries
> will be different after they are parsed and NOW is converted to an
> actual timestamp.
>
> > Also, if you are using TrieDateField, precisionStep value is how
> > optimization is done. There is bucketing at different level of
> > precision, so the range search works at the least granular level
> > first, etc.
>
> Some nitty-gritty details of how range queries are accelerated with the
> Trie data types and precisionStep are described in the Javadoc for
> NumericRangeQuery:
>
>
> http://lucene.apache.org/core/4_8_0/core/org/apache/lucene/search/NumericRangeQuery.html
>
> Thanks,
> Shawn
>
>


-- 
Dmitry Kan
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan

Re: date range queries efficiency

Posted by Shawn Heisey <so...@elyograg.org>.
On 5/15/2014 1:34 AM, Alexandre Rafalovitch wrote:
> I thought the date math rounding was for _caching_ the repeated
> queries, not so much the speed of the query itself.

Absolutely correct.  When NOW is used without rounding, caching is
completely ineffective.  This is because if the same query using NOW is
sent multiple times several seconds apart, every one of those queries
will be different after they are parsed and NOW is converted to an
actual timestamp.

> Also, if you are using TrieDateField, precisionStep value is how
> optimization is done. There is bucketing at different level of
> precision, so the range search works at the least granular level
> first, etc.

Some nitty-gritty details of how range queries are accelerated with the
Trie data types and precisionStep are described in the Javadoc for
NumericRangeQuery:

http://lucene.apache.org/core/4_8_0/core/org/apache/lucene/search/NumericRangeQuery.html

Thanks,
Shawn


Re: date range queries efficiency

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
I thought the date math rounding was for _caching_ the repeated
queries, not so much the speed of the query itself.

Also, if you are using TrieDateField, precisionStep value is how
optimization is done. There is bucketing at different level of
precision, so the range search works at the least granular level
first, etc.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Fri, May 9, 2014 at 5:41 PM, Dmitry Kan <so...@gmail.com> wrote:
> Hi,
>
> There was a mention either on solr wiki or on this list, that in order to
> optimize the date range queries, it is beneficial to round down the range
> values.
>
> For example, if a range query is:
>
> DateTime:[NOW-3DAYS TO NOW]
>
> then if the precision up to msec is not required, we can safely round that
> down to a day or hour, for example:
>
> DateTime:[NOW-3DAYS/DAY TO NOW/DAY]
> DateTime:[NOW-3DAYS/HOUR TO NOW/HOUR]
>
> What I'm wondering about is what other optimizations would make sense here
> on the indexing side? Luke shows that solr stores dates as longs with
> millisecond precision. So this seems to utilize the efficient Lucene
> numeric range queries internally.
>
> If we do not need msec precision on dates during search, does it make sense
> to also "round" dates down during indexing? Are there any other tips and
> tricks for efficient date range queries?
>
> Thanks!
>
> --
> Dmitry
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan