You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Guillaume Smet <gu...@gmail.com> on 2007/09/27 20:40:56 UTC

Date facetting and ranges overlapping

Hi all,

I'm now using date facetting to browse events. It works really fine
and is really useful. The only problem so far is that if I have an
event which is exactly on the boundary of two ranges, it is referenced
2 times.

If we admit that we have a gap of 6 hours starting from 2007-09-27
12:00, ranges are: 2007-09-27 12:00->18:00 and 2007-09-27 18:00->
00:00. An event happening exactly at 18:00 is referenced in both
ranges and so if I select the first range Solr returns both ranges in
facet_dates instead of the first one only.

Couldn't we create the range so that they don't overlap? Something like:
2007-09-27 12:00 -> 2007-09-27 17:59:59.999 for the first one and
2007-09-27 18:00 -> 2007-09-27 23:59:59.999 for the second one.

I don't think people use date facetting with a millisecond range so
retrieving 1 millisecond shouldn't be too much a problem in practice.

Thanks for any comment.

--
Guillaume

Re: Date facetting and ranges overlapping

Posted by Guillaume Smet <gu...@gmail.com>.
On 9/27/07, Chris Hostetter <ho...@fucit.org> wrote:
> The simple workarround: if you know all of your data is indexed with
> perfect 0.000second precision, then put "-1MILLI" at the end of your start
> and end date faceting params.

It fixed my problem. Thanks.

--
Guillaume

Re: Date facetting and ranges overlapping

Posted by Guillaume Smet <gu...@gmail.com>.
On 9/27/07, Chris Hostetter <ho...@fucit.org> wrote:
> a better option (assuming a query parser change) would be a new option
> thta says wether each computed range should be enclusive of the low poin,t
> the high point, both end points, neither end points, or be "smart" (where
> smart is the same as "low" except for the last range where the it includes
> both)

That could be really cool.

> The simple workarround: if you know all of your data is indexed with
> perfect 0.000second precision, then put "-1MILLI" at the end of your start
> and end date faceting params.

Good idea. The only problem is that I'll have to modify my client code
to deal with the fact that solr now returns 17:59:59 instead of
18:00:00. Not difficult but less clean than before.

Thanks for the advice. I'll give it a try.

--
Guillaume

Re: Date facetting and ranges overlapping

Posted by Chris Hostetter <ho...@fucit.org>.
: I'm now using date facetting to browse events. It works really fine
: and is really useful. The only problem so far is that if I have an
: event which is exactly on the boundary of two ranges, it is referenced
: 2 times.

yeah, this is one of the big caveats with date faceting right now ... i 
struggled with this a bit when designing it, and ultimately decided to 
punt on the issue.  the biggest hangup was that even if hte facet counting 
code was smart about making sure the ranges don't overlap, the range query 
syntax in the QueryParser doesn't support ranges that exclude one input 
(so there wouldn't be a lot you can do with the ranges once you know the 
counts in them)

one idea i had in SOLR-258 was that we could add an "interval" option that 
would define how much to add to the "end" or one range to get the "start" 
of another range (think of the current implementation having interval 
hardcoded to "0") which would solve the problem and work with range 
queries that were inclusive of both endpoints, but would require people to 
use "-1MILLI" a lot.

a better option (assuming a query parser change) would be a new option 
thta says wether each computed range should be enclusive of the low poin,t 
the high point, both end points, neither end points, or be "smart" (where 
smart is the same as "low" except for the last range where the it includes 
both)

(I think there's already a lucene issue to add the query parser support, i 
just haven't had time to look at it)

The simple workarround: if you know all of your data is indexed with 
perfect 0.000second precision, then put "-1MILLI" at the end of your start 
and end date faceting params.



-Hoss