You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by britske <gb...@gmail.com> on 2011/09/26 11:51:41 UTC

multiple dateranges/timeslots per doc: modeling openinghours.

Sorry for the somewhat length post, I would like to make clear that I covered
my basis here, and looking for an alternative solution, because the more
trivial solutions don't seem to work for my use-case.

Consider Bars, musea, etc.

These places have multiple openinghours that can depend on:
REQ 1. day of week
REQ 2. special days on which they are closed, or have in another way
different openinghours than there related 'day of week'

Now, I want to model these 'places' in a way so I'm able to do temporal
queries like:
- which bars are open NOW (and stay open for at least another 3 hours)
- which musea are (already) open at 25-12-2011 - 10AM - and stay open until
(at least) 3PM.

I believe having opening/closing hours available for each day at least gives
me the data needed to query the above. (Note that having
dayOfWeek*openinghours is not enough, bc. of the special cases in 2.)

Okay knowing I need openinghours*dates for each place, how would I format
this in documents?

OPTION A)
-----------
Considering granularity: I want documents to represent Places and not
Places*dates. Although the latter would trivially allow me to do the quering
mentioned above, it has the disadvantages:

- same place returned multiple times (each with a different date) when
queries are not constrained to date.
- Lot's of data needs to be duplicated, all for the conceptually 'simple'
functionality of needing multiple date-ranges. It feels bad and a simpler
solution should exist?
- Exploding the resultset (documents = say, 100 dates * 1.000.000 =
100.000.000. ) suddenly the size of the resultset goes from 'easily doable'
to 'hmmm I have to think about this'. Given that places also have some other
fields to sort on, Lucene fieldcache mem-usage would explode with a factor
100.

OPTION B)
----------
Another, faulty, option would be to model opening/closing hours in 2
multivalued date-fields, i.e: open, close. and insert open/close for each
day, e.g:

open: 2011-11-08:1800 - close: 2011-11-09:0300
open: 2011-11-09:1700 - close: 2011-11-10:0500
open: 2011-11-10:1700 - close: 2011-11-11:0300

And queries would be of the form:

'open < now && close > now+3h'

But since there is no way to indicate that 'open' and 'close' are pairwise
related I will get a lot of false positives, e.g the above document would be
returned for:

open < 2011-11-09:0100 && close > 2011-11-09:0600
because SOME opendate is before 2011-11-09:0100 (i.e: 2011-11-08:1800) and
SOME closedate is after 2011-11-09:0600 (for example: 2011-11-11:0300) but
these open and close-dates are not pairwise related.

OPTION C) The best of what I have now:
---------------------------------------
I have been thinking about a totally different approach using Solr dynamic
fields, in which each and every opening and closing-date gets it's own
dynamic field, e.g:

_date_2011-11-09_open: 1800
_date_2011-11-09_close: 0300
_date_2011-11-09_open: 1700
_date_2011-11-10_close: 0500
_date_2011-11-10_open: 1700
_date_2011-11-11_close: 0300

Then, the client should know the date to query, and thus the correct fields
to query. This would solve the problem, since startdate/ enddate are nor
pairwise -related, but I fear this can be a big issue from a performance
standpoint (especially memory consumption of the Lucene fieldcache)

IDEAL OPTION D)
----------------
I'm pretty sure this does not exist out-of-the-box, but might be extended.
Okay, Solr has a fieldtype: date, but what if it also had a fieldtype:
Daterange? A Daterange would be modeled as &lt;DateTimeA,DateTimeB&gt; or
&lt;DateTimeA,Delta DateTimeA&gt;

Then this problem would be really easily modelled as a multivalued field
'openinghours' of type 'Daterange'.
However, I have the feeling that the standard range-query implementation
can't be used on this fieldtype, or perhaps should be run for each of the N
datereange-values in 'openinghours'.

To make matters worse ( I didn't want to introduce this above)
REQ 3: It may be possible that certain places have multiple opening-hours /
timeslots each day. Consider museum in Spain which get's closed around noon
because of siesta-time.
OPTION D) would be able to handle this natively, all other options can't.

I would very much appreciate any pointers to:
- how to start with option D. and if this approach is at all feasible.
- if option C. would suffice. (excluding REQ 3. ), and if I'm likely to run
into performance / memory troubles.
- any other possible solutions I haven' thought of to tackle this.

Thanks a lot.

Cheers,
Geert-Jan

--
View this message in context: http://lucene.472066.n3.nabble.com/multiple-dateranges-timeslots-per-doc-modeling-openinghours-tp3368790p3368790.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: multiple dateranges/timeslots per doc: modeling openinghours.

Posted by "David Smiley (@MITRE.org)" <DS...@mitre.org>.

In case anyone is curious, I responded to him with a solution using either
SOLR-2155 (Geohash prefix query filter) or LSP:
https://issues.apache.org/jira/browse/SOLR-2155?focusedCommentId=13115244#comment-13115244

~ David Smiley

-----
 Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book
--
View this message in context: http://lucene.472066.n3.nabble.com/multiple-dateranges-timeslots-per-doc-modeling-openinghours-tp3368790p3371747.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: multiple dateranges/timeslots per doc: modeling openinghours.

Posted by Chris Hostetter <ho...@fucit.org>.

: This would need 2*3*100 = 600 dynamicfields to cover the openinghours. You
: mention this is peanuts for constructing a booleanquery, but how about
: memory consumption?
: I'm particularly concerned about the Lucene FieldCache getting populated for
: each of the 600 fields. (Since I had some nasty OOM experiences with that in
: the past. 2-3 years ago memory consumption of Lucene FieldCache couldn't be
: controlled, I'm not sure how that is now to be honest)
: 
: I will not be sorting on any of the 600 dynamicfields btw. Instead I will
: only use them as part of the above booleanquery, which I will likely define
: as a Filter Query.
: Just to be sure, in this situation, Lucene FieldCache won't be touched,
: correct? If so, this will probably be a good workable solution!

correct.  searching on fields doesn't use the FieldCache (unless you are 
doing a function query - you aren't in this case) so the memory usage of 
FieldCache wouldn't be a factor here at all.


-Hoss

Re: multiple dateranges/timeslots per doc: modeling openinghours.

Posted by Geert-Jan Brits <gb...@gmail.com>.

Op 11 oktober 2011 03:21 schreef Chris Hostetter
<ho...@fucit.org>het volgende:

>
> : Conceptually
> : the Join-approach looks like it would work from paper, although I'm not a
> : big fan of introducing a lot of complexity to the frontend / querying
> part
> : of the solution.
>
> you lost me there -- i don't see how using join would impact the front end
> / query side at all.  your query clients would never even know that a join
> had happened (your indexing code would certianly have to know about
> creating those special case docs to join against obviuosly)
>
> : As an alternative, what about using your fieldMaskingSpanQuery-approach
> : solely (without the JOIN-approach)  and encode open/close on a per day
> : basis?
> : I didn't mention it, but I 'only' need 100 days of data, which would lead
> to
> : 100 open and 100 close values, not counting the pois with multiple
>         ...
> : Data then becomes:
> :
> : open: 20111020_12_30, 20111021_12_30, 20111022_07_30, ...
> : close: 20111020_20_00, 20111021_26_30, 20111022_12_30, ...
>
> aw hell ... i assumed you needed to suport an arbitrarily large number
> of special case open+close pairs per doc.
>

I didn't express myself well. A POI can have multiple open+close pairs per
day, but each night I only index the coming 100 days. So MOST POIs will have
100 open+close pairs (1 openinghours per day) but some have more.


>
> if you only have to support a fix value (N=100) open+close values you
> could just have N*2 date fields and a BooleanQuery containing N 2-clause
> BooleanQueries contain ranging queries against each pair of your date
> fields. ie...
>
>  ((+open00:[* TO NOW] +close00:[NOW+3HOURS TO *])
>   (+open01:[* TO NOW] +close01:[NOW+3HOURS TO *])
>   (+open02:[* TO NOW] +close02:[NOW+3HOURS TO *])
>   ...etc...
>   (+open99:[* TO NOW] +close99:[NOW+3HOURS TO *]))
>
> ...for a lot of indexes, 100 clauses is small potatoes as far as number of
> boolean clauses go, especially if many of them are going to short circut
> out because there won't be any matches at all.
>

Given that I need multiple open+close pairs per day this can't be used
directly.

However when setting a logical upperbound on the maximum nr of openinghours
per day (say 3), which would be possible, this could be extended to:
open00 = day0 --> open00-0 = day0 timeslot 0, open00-1 = day0 timeslot 1,
etc.

So,

 ((+open00-0:[* TO NOW] +close00-0:[NOW+3HOURS TO *])
(+open00-1:[* TO NOW] +close00-1:[NOW+3HOURS TO *])
(+open00-2:[* TO NOW] +close00-2:[NOW+3HOURS TO *])
  (+open01-0:[* TO NOW] +close01-0:[NOW+3HOURS TO *])
 (+open01-1:[* TO NOW] +close01-1:[NOW+3HOURS TO *])
 (+open01-2:[* TO NOW] +close01-2:[NOW+3HOURS TO *])
  ...etc...
  (+open99:[* TO NOW] +close99:[NOW+3HOURS TO *]))

This would need 2*3*100 = 600 dynamicfields to cover the openinghours. You
mention this is peanuts for constructing a booleanquery, but how about
memory consumption?
I'm particularly concerned about the Lucene FieldCache getting populated for
each of the 600 fields. (Since I had some nasty OOM experiences with that in
the past. 2-3 years ago memory consumption of Lucene FieldCache couldn't be
controlled, I'm not sure how that is now to be honest)

I will not be sorting on any of the 600 dynamicfields btw. Instead I will
only use them as part of the above booleanquery, which I will likely define
as a Filter Query.
Just to be sure, in this situation, Lucene FieldCache won't be touched,
correct? If so, this will probably be a good workable solution!


> : Alternatively, how would you compare your suggested approach with the
> : approach by David Smiley using either SOLR-2155 (Geohash prefix query
> : filter) or LSP:
> :
> https://issues.apache.org/jira/browse/SOLR-2155?focusedCommentId=13115244#comment-13115244
> .
> : That would work right now, and the LSP-approach seems pretty elegant to
> me.
>
> I'm afraid i'm totally ignorant of how the LSP stuff works so i can't
> really comment there.
>
> If i understand what you mean about mapping the open/close concepts to
> lat/lon concepts, then i can see how it would be useful for multiple pair
> wise (absolute) date ranges, but i'm not really sure how you would deal
> with the diff open+close pairs per day (or on diff days of hte week, or
> special days of the year) using the lat+lon conceptual model ... I guess
> if the LSP stuff supports arbitrary N-dimensional spaces then you could
> model day or week as a dimension .. but it still seems like you'd need
> multiple fields for the special case days, right?
>

I planned to do the folllowing using LSP, (through help from David)

Each <open,close>-tuple would be modeled as a point(x,y) . (x = open, y =
close)
So a POI can have many (100 or more) points, each representing
a <open,close>-tuple.

Given: 100 days lookahead, granularity: 5 min, we can map dimensions x and y
to to [0,30000]

E.g:
- indexing starts at / baseline is at: 2011-11-01:0000
- poi open: 2011-11-08:1800 - poi close: 2011-11-09:0300
- (query): user visit: 2011-11-08:2300 - user depart: 2011-11-09:0200

Would map to:
- poi open: 2520 - poi close: 2628 =  point(x,y) = (2520,2628)
- (query):user visit: 2580 - user depart: 2616 = bbox filter with the
ranges x:[0 TO 2580], y:[2616 TO 30000]

All pois are returned which have one or more points within the bbox.

Both approaches seem pretty good to me. I'll be testing both soon.

Thanks!
Geert-Jan




> How it would compare performance wise: no idea.
>
>
> -Hoss
>

Re: multiple dateranges/timeslots per doc: modeling openinghours.

Posted by Chris Hostetter <ho...@fucit.org>.

: Conceptually
: the Join-approach looks like it would work from paper, although I'm not a
: big fan of introducing a lot of complexity to the frontend / querying part
: of the solution.

you lost me there -- i don't see how using join would impact the front end
/ query side at all. your query clients would never even know that a join
had happened (your indexing code would certianly have to know about
creating those special case docs to join against obviuosly)

: As an alternative, what about using your fieldMaskingSpanQuery-approach
: solely (without the JOIN-approach) and encode open/close on a per day
: basis?
: I didn't mention it, but I 'only' need 100 days of data, which would lead to
: 100 open and 100 close values, not counting the pois with multiple
...
: Data then becomes:
:
: open: 20111020_12_30, 20111021_12_30, 20111022_07_30, ...
: close: 20111020_20_00, 20111021_26_30, 20111022_12_30, ...

aw hell ... i assumed you needed to suport an arbitrarily large number
of special case open+close pairs per doc.

if you only have to support a fix value (N=100) open+close values you
could just have N*2 date fields and a BooleanQuery containing N 2-clause
BooleanQueries contain ranging queries against each pair of your date
fields. ie...

((+open00:[* TO NOW] +close00:[NOW+3HOURS TO *])
(+open01:[* TO NOW] +close01:[NOW+3HOURS TO *])
(+open02:[* TO NOW] +close02:[NOW+3HOURS TO *])
...etc...
(+open99:[* TO NOW] +close99:[NOW+3HOURS TO *]))

...for a lot of indexes, 100 clauses is small potatoes as far as number of
boolean clauses go, especially if many of them are going to short circut
out because there won't be any matches at all.

: Alternatively, how would you compare your suggested approach with the
: approach by David Smiley using either SOLR-2155 (Geohash prefix query
: filter) or LSP:
: https://issues.apache.org/jira/browse/SOLR-2155?focusedCommentId=13115244#comment-13115244.
: That would work right now, and the LSP-approach seems pretty elegant to me.

I'm afraid i'm totally ignorant of how the LSP stuff works so i can't
really comment there.

If i understand what you mean about mapping the open/close concepts to
lat/lon concepts, then i can see how it would be useful for multiple pair
wise (absolute) date ranges, but i'm not really sure how you would deal
with the diff open+close pairs per day (or on diff days of hte week, or
special days of the year) using the lat+lon conceptual model ... I guess
if the LSP stuff supports arbitrary N-dimensional spaces then you could
model day or week as a dimension .. but it still seems like you'd need
multiple fields for the special case days, right?

How it would compare performance wise: no idea.

-Hoss

Re: multiple dateranges/timeslots per doc: modeling openinghours.

Posted by Geert-Jan Brits <gb...@gmail.com>.

Thanks Hoss for that in-depth walkthrough.

I like your solution of using (something akin to)
FieldMaskingSpanQuery<https://lucene.apache.org/java/3_4_0/api/core/org/apache/lucene/search/spans/FieldMaskingSpanQuery.html>.
Conceptually
the Join-approach looks like it would work from paper, although I'm not a
big fan of introducing a lot of complexity to the frontend / querying part
of the solution.

As an alternative, what about using your fieldMaskingSpanQuery-approach
solely (without the JOIN-approach)  and encode open/close on a per day
basis?
I didn't mention it, but I 'only' need 100 days of data, which would lead to
100 open and 100 close values, not counting the pois with multiple
openinghours per day which are pretty rare.
The index is rebuild each night, refreshing the date-data.

I'm not sure what the performance implications would be like, but somehow
that feels doable. Perhaps it even offsets the extra time needed for doing
the Joins, only 1 way to find out I guess.
Disadvantage would be fewer cache-hits when using FQ.

Data then becomes:

open: 20111020_12_30, 20111021_12_30, 20111022_07_30, ...
close: 20111020_20_00, 20111021_26_30, 20111022_12_30, ...

Notice the: 20111021_26_30, which indicates close at 2AM the next day,
which would work (in contrast to encoding it like 20111022_02_30)

Alternatively, how would you compare your suggested approach with the
approach by David Smiley using either SOLR-2155 (Geohash prefix query
filter) or LSP:
https://issues.apache.org/jira/browse/SOLR-2155?focusedCommentId=13115244#comment-13115244.
That would work right now, and the LSP-approach seems pretty elegant to me.
FQ-style caching is probably not possible though.

Geert-Jan

Op 1 oktober 2011 04:25 schreef Chris Hostetter
<ho...@fucit.org>het volgende:

>
> : Another, faulty, option would be to model opening/closing hours in 2
> : multivalued date-fields, i.e: open, close. and insert open/close for each
> : day, e.g:
> :
> : open: 2011-11-08:1800 - close: 2011-11-09:0300
> : open: 2011-11-09:1700 - close: 2011-11-10:0500
> : open: 2011-11-10:1700 - close: 2011-11-11:0300
> :
> : And queries would be of the form:
> :
> : 'open < now && close > now+3h'
> :
> : But since there is no way to indicate that 'open' and 'close' are
> pairwise
> : related I will get a lot of false positives, e.g the above document would
> be
> : returned for:
>
> This isn't possible out of the box, but the general idea of "position
> linked" queries is possible using the same approach as the
> FieldMaskingSpanQuery...
>
>
> https://lucene.apache.org/java/3_4_0/api/core/org/apache/lucene/search/spans/FieldMaskingSpanQuery.html
> https://issues.apache.org/jira/browse/LUCENE-1494
>
> ..implementing something like this that would work with
> (Numeric)RangeQueries however would require some additional work, but it
> should certianly be doable -- i've suggested this before but no one has
> taken me up on it...
> http://markmail.org/search/?q=hoss+FieldMaskingSpanQuery
>
> If we take it as a given that you can do multiple ranges "at the same
> position", then you can imagine supporting all of your "regular" hours
> using just two fields ("open" and "close") by encoding the day+time of
> each range of open hours into them -- even if a store is open for multiple
> sets of ranges per day (ie: closed for siesta)...
>
>  open: mon_12_30, tue_12_30, wed_07_30, wed_3_30, ...
>  close: mon_20_00, tue_20_30, wed_12_30, wed_22_30, ...
>
> then asking for "stores open now and for the next 3 hours" on "wed" at
> "2:13PM" becomes a query for...
>
> sameposition(open:[* TO wed_14_13], close:[wed_17_13 TO *])
>
> For the special case part of your problem when there are certain dates
> that a store will be open atypical hours, i *think* that could be solved
> using some special docs and the new "join" QParser in a filter query...
>
>        https://wiki.apache.org/solr/Join
>
> imagine you have your "regular" docs with all the normal data about a
> store, and the open/close fields i describe above.  but in addition to
> those, for any store that you know is "closed on dec 25" or "only open
> 12:00-15:00 on Jan 01" you add an additional small doc encapsulating
> the information about the stores closures on that special date - so that
> each special case would be it's own doc, even if one store had 5 days
> where there was a special case...
>
>  specialdoc1:
>    store_id: 42
>    special_date: Dec-25
>    status: closed
>  specialdoc2:
>    store_id: 42
>    special_date: Jan-01
>    status: irregular
>    open: 09_30
>    close: 13_00
>
> then when you are executing your query, you use an "fq" to constrain to
> stores that are (normally) open right now (like i mentioned above) and you
> use another fq to find all docs *except* those resulting from a join
> against these special case docs based on the current date.
>
> so if you r query is "open now and for the next 3 hours" and "now" ==
> "sunday, 2011-12-25 @ 10:17AM your query would be something like...
>
> q=...user input...
> time=sameposition(open:[* TO sun_10_17], close:[sun_13_17 TO *])
> fq={!v=time}
> fq={!join from=store_id to=unique_key v=$vv}
> vv=-(+special_date:Dec-25 +(status:closed OR _query_:"{v=$time}"))
>
> That join based approach for dealing with the special dates should work
> regardless of wether someone implements a way to do pair wise
> "sameposition()" rangequeries ... so if you can live w/o the multiple
> open/close pairs per day, you can just use the "one field per day of hte
> week" type approach you mentioned combined with the "join" for special
> case days of hte year and everything you need should already work w/o any
> code (on trunk).
>
> (disclaimer: obviously i haven't tested that query, the exact syntax may
> be off but the princible for modeling the "special docs" and using
> them in a join should work)
>
>
> -Hoss
>

Re: multiple dateranges/timeslots per doc: modeling openinghours.

Posted by Mikhail Khludnev <mk...@griddynamics.com>.

On Mon, Oct 3, 2011 at 3:09 PM, Geert-Jan Brits <gb...@gmail.com> wrote:

> Interesting! Reading your previous blogposts, I gather that the to be
> posted
> 'implementation approaches' includes a way of making the SpanQueries
> available within SOLR?
>

It's going to be posted in two days. But please don't expect much from them,
it's just a proof of concept. It's not a code for production nor for
contribution. e.g. we've chosen 'quick hack' way of boolean query converting
instead of XmlQuery, SurroundParser or contrib's query parser, etc. i.e. we
can share only core ideas, some of these are possibly wrong.


> Also, would with your approach would (numeric) RangeQueries be possible as
> Hoss suggests?
>

Basically range queries are just conjunctions (sometimes it's not great at
all) for numbers. If you encode your terms in sortable manner eg A0715 for
Monday 7-15 am, you'll be able to build the span merging 'conjunction' - new
SpanOrQuery(new SpanTermQuery(..),.... ).

Regards

Mikhail


> Looking forward to that 'implementation post'
> Cheers,
> Geert-Jan
>
> Op 1 oktober 2011 19:57 schreef Mikhail Khludnev <
> mkhludnev@griddynamics.com
> > het volgende:
>
> > I agree about SpanQueries. It's a viable measure against "false-positive
> > matches on multivalue fields".
> >  we've implemented this approach some time ago. Pls find details at
> >
> >
> http://blog.griddynamics.com/2011/06/solr-experience-search-parent-child.html
> >
> > and
> >
> >
> http://blog.griddynamics.com/2011/07/solr-experience-search-parent-child.html
> > we are going to publish the third post about an implementation
> approaches.
> >
> > --
> > Mikhail Khludnev
> >
> >
> > On Sat, Oct 1, 2011 at 6:25 AM, Chris Hostetter <
> hossman_lucene@fucit.org
> > >wrote:
> >
> > >
> > > : Another, faulty, option would be to model opening/closing hours in 2
> > > : multivalued date-fields, i.e: open, close. and insert open/close for
> > each
> > > : day, e.g:
> > > :
> > > : open: 2011-11-08:1800 - close: 2011-11-09:0300
> > > : open: 2011-11-09:1700 - close: 2011-11-10:0500
> > > : open: 2011-11-10:1700 - close: 2011-11-11:0300
> > > :
> > > : And queries would be of the form:
> > > :
> > > : 'open < now && close > now+3h'
> > > :
> > > : But since there is no way to indicate that 'open' and 'close' are
> > > pairwise
> > > : related I will get a lot of false positives, e.g the above document
> > would
> > > be
> > > : returned for:
> > >
> > > This isn't possible out of the box, but the general idea of "position
> > > linked" queries is possible using the same approach as the
> > > FieldMaskingSpanQuery...
> > >
> > >
> > >
> >
> https://lucene.apache.org/java/3_4_0/api/core/org/apache/lucene/search/spans/FieldMaskingSpanQuery.html
> > > https://issues.apache.org/jira/browse/LUCENE-1494
> > >
> > > ..implementing something like this that would work with
> > > (Numeric)RangeQueries however would require some additional work, but
> it
> > > should certianly be doable -- i've suggested this before but no one has
> > > taken me up on it...
> > > http://markmail.org/search/?q=hoss+FieldMaskingSpanQuery
> > >
> > > If we take it as a given that you can do multiple ranges "at the same
> > > position", then you can imagine supporting all of your "regular" hours
> > > using just two fields ("open" and "close") by encoding the day+time of
> > > each range of open hours into them -- even if a store is open for
> > multiple
> > > sets of ranges per day (ie: closed for siesta)...
> > >
> > >  open: mon_12_30, tue_12_30, wed_07_30, wed_3_30, ...
> > >  close: mon_20_00, tue_20_30, wed_12_30, wed_22_30, ...
> > >
> > > then asking for "stores open now and for the next 3 hours" on "wed" at
> > > "2:13PM" becomes a query for...
> > >
> > > sameposition(open:[* TO wed_14_13], close:[wed_17_13 TO *])
> > >
> > > For the special case part of your problem when there are certain dates
> > > that a store will be open atypical hours, i *think* that could be
> solved
> > > using some special docs and the new "join" QParser in a filter query...
> > >
> > >        https://wiki.apache.org/solr/Join
> > >
> > > imagine you have your "regular" docs with all the normal data about a
> > > store, and the open/close fields i describe above.  but in addition to
> > > those, for any store that you know is "closed on dec 25" or "only open
> > > 12:00-15:00 on Jan 01" you add an additional small doc encapsulating
> > > the information about the stores closures on that special date - so
> that
> > > each special case would be it's own doc, even if one store had 5 days
> > > where there was a special case...
> > >
> > >  specialdoc1:
> > >    store_id: 42
> > >    special_date: Dec-25
> > >    status: closed
> > >  specialdoc2:
> > >    store_id: 42
> > >    special_date: Jan-01
> > >    status: irregular
> > >    open: 09_30
> > >    close: 13_00
> > >
> > > then when you are executing your query, you use an "fq" to constrain to
> > > stores that are (normally) open right now (like i mentioned above) and
> > you
> > > use another fq to find all docs *except* those resulting from a join
> > > against these special case docs based on the current date.
> > >
> > > so if you r query is "open now and for the next 3 hours" and "now" ==
> > > "sunday, 2011-12-25 @ 10:17AM your query would be something like...
> > >
> > > q=...user input...
> > > time=sameposition(open:[* TO sun_10_17], close:[sun_13_17 TO *])
> > > fq={!v=time}
> > > fq={!join from=store_id to=unique_key v=$vv}
> > > vv=-(+special_date:Dec-25 +(status:closed OR _query_:"{v=$time}"))
> > >
> > > That join based approach for dealing with the special dates should work
> > > regardless of wether someone implements a way to do pair wise
> > > "sameposition()" rangequeries ... so if you can live w/o the multiple
> > > open/close pairs per day, you can just use the "one field per day of
> hte
> > > week" type approach you mentioned combined with the "join" for special
> > > case days of hte year and everything you need should already work w/o
> any
> > > code (on trunk).
> > >
> > > (disclaimer: obviously i haven't tested that query, the exact syntax
> may
> > > be off but the princible for modeling the "special docs" and using
> > > them in a join should work)
> > >
> > >
> > > -Hoss
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail (Mike) Khludnev
> > Developer
> > Grid Dynamics
> > tel. 1-415-738-8644
> > Skype: mkhludnev
> > <http://www.griddynamics.com>
> >  <mk...@griddynamics.com>
> >
>



-- 
Sincerely yours
Mikhail (Mike) Khludnev
Developer
Grid Dynamics
tel. 1-415-738-8644
Skype: mkhludnev
<http://www.griddynamics.com>
 <mk...@griddynamics.com>

Re: multiple dateranges/timeslots per doc: modeling openinghours.

Posted by Geert-Jan Brits <gb...@gmail.com>.

Interesting! Reading your previous blogposts, I gather that the to be posted
'implementation approaches' includes a way of making the SpanQueries
available within SOLR?
Also, would with your approach would (numeric) RangeQueries be possible as
Hoss suggests?

Looking forward to that 'implementation post'
Cheers,
Geert-Jan

Op 1 oktober 2011 19:57 schreef Mikhail Khludnev <mkhludnev@griddynamics.com
> het volgende:

> I agree about SpanQueries. It's a viable measure against "false-positive
> matches on multivalue fields".
>  we've implemented this approach some time ago. Pls find details at
>
> http://blog.griddynamics.com/2011/06/solr-experience-search-parent-child.html
>
> and
>
> http://blog.griddynamics.com/2011/07/solr-experience-search-parent-child.html
> we are going to publish the third post about an implementation approaches.
>
> --
> Mikhail Khludnev
>
>
> On Sat, Oct 1, 2011 at 6:25 AM, Chris Hostetter <hossman_lucene@fucit.org
> >wrote:
>
> >
> > : Another, faulty, option would be to model opening/closing hours in 2
> > : multivalued date-fields, i.e: open, close. and insert open/close for
> each
> > : day, e.g:
> > :
> > : open: 2011-11-08:1800 - close: 2011-11-09:0300
> > : open: 2011-11-09:1700 - close: 2011-11-10:0500
> > : open: 2011-11-10:1700 - close: 2011-11-11:0300
> > :
> > : And queries would be of the form:
> > :
> > : 'open < now && close > now+3h'
> > :
> > : But since there is no way to indicate that 'open' and 'close' are
> > pairwise
> > : related I will get a lot of false positives, e.g the above document
> would
> > be
> > : returned for:
> >
> > This isn't possible out of the box, but the general idea of "position
> > linked" queries is possible using the same approach as the
> > FieldMaskingSpanQuery...
> >
> >
> >
> https://lucene.apache.org/java/3_4_0/api/core/org/apache/lucene/search/spans/FieldMaskingSpanQuery.html
> > https://issues.apache.org/jira/browse/LUCENE-1494
> >
> > ..implementing something like this that would work with
> > (Numeric)RangeQueries however would require some additional work, but it
> > should certianly be doable -- i've suggested this before but no one has
> > taken me up on it...
> > http://markmail.org/search/?q=hoss+FieldMaskingSpanQuery
> >
> > If we take it as a given that you can do multiple ranges "at the same
> > position", then you can imagine supporting all of your "regular" hours
> > using just two fields ("open" and "close") by encoding the day+time of
> > each range of open hours into them -- even if a store is open for
> multiple
> > sets of ranges per day (ie: closed for siesta)...
> >
> >  open: mon_12_30, tue_12_30, wed_07_30, wed_3_30, ...
> >  close: mon_20_00, tue_20_30, wed_12_30, wed_22_30, ...
> >
> > then asking for "stores open now and for the next 3 hours" on "wed" at
> > "2:13PM" becomes a query for...
> >
> > sameposition(open:[* TO wed_14_13], close:[wed_17_13 TO *])
> >
> > For the special case part of your problem when there are certain dates
> > that a store will be open atypical hours, i *think* that could be solved
> > using some special docs and the new "join" QParser in a filter query...
> >
> >        https://wiki.apache.org/solr/Join
> >
> > imagine you have your "regular" docs with all the normal data about a
> > store, and the open/close fields i describe above.  but in addition to
> > those, for any store that you know is "closed on dec 25" or "only open
> > 12:00-15:00 on Jan 01" you add an additional small doc encapsulating
> > the information about the stores closures on that special date - so that
> > each special case would be it's own doc, even if one store had 5 days
> > where there was a special case...
> >
> >  specialdoc1:
> >    store_id: 42
> >    special_date: Dec-25
> >    status: closed
> >  specialdoc2:
> >    store_id: 42
> >    special_date: Jan-01
> >    status: irregular
> >    open: 09_30
> >    close: 13_00
> >
> > then when you are executing your query, you use an "fq" to constrain to
> > stores that are (normally) open right now (like i mentioned above) and
> you
> > use another fq to find all docs *except* those resulting from a join
> > against these special case docs based on the current date.
> >
> > so if you r query is "open now and for the next 3 hours" and "now" ==
> > "sunday, 2011-12-25 @ 10:17AM your query would be something like...
> >
> > q=...user input...
> > time=sameposition(open:[* TO sun_10_17], close:[sun_13_17 TO *])
> > fq={!v=time}
> > fq={!join from=store_id to=unique_key v=$vv}
> > vv=-(+special_date:Dec-25 +(status:closed OR _query_:"{v=$time}"))
> >
> > That join based approach for dealing with the special dates should work
> > regardless of wether someone implements a way to do pair wise
> > "sameposition()" rangequeries ... so if you can live w/o the multiple
> > open/close pairs per day, you can just use the "one field per day of hte
> > week" type approach you mentioned combined with the "join" for special
> > case days of hte year and everything you need should already work w/o any
> > code (on trunk).
> >
> > (disclaimer: obviously i haven't tested that query, the exact syntax may
> > be off but the princible for modeling the "special docs" and using
> > them in a join should work)
> >
> >
> > -Hoss
> >
>
>
>
> --
> Sincerely yours
> Mikhail (Mike) Khludnev
> Developer
> Grid Dynamics
> tel. 1-415-738-8644
> Skype: mkhludnev
> <http://www.griddynamics.com>
>  <mk...@griddynamics.com>
>

Re: multiple dateranges/timeslots per doc: modeling openinghours.

Posted by Mikhail Khludnev <mk...@griddynamics.com>.

I agree about SpanQueries. It's a viable measure against "false-positive
matches on multivalue fields".
 we've implemented this approach some time ago. Pls find details at
http://blog.griddynamics.com/2011/06/solr-experience-search-parent-child.html

and
http://blog.griddynamics.com/2011/07/solr-experience-search-parent-child.html
we are going to publish the third post about an implementation approaches.

--
Mikhail Khludnev


On Sat, Oct 1, 2011 at 6:25 AM, Chris Hostetter <ho...@fucit.org>wrote:

>
> : Another, faulty, option would be to model opening/closing hours in 2
> : multivalued date-fields, i.e: open, close. and insert open/close for each
> : day, e.g:
> :
> : open: 2011-11-08:1800 - close: 2011-11-09:0300
> : open: 2011-11-09:1700 - close: 2011-11-10:0500
> : open: 2011-11-10:1700 - close: 2011-11-11:0300
> :
> : And queries would be of the form:
> :
> : 'open < now && close > now+3h'
> :
> : But since there is no way to indicate that 'open' and 'close' are
> pairwise
> : related I will get a lot of false positives, e.g the above document would
> be
> : returned for:
>
> This isn't possible out of the box, but the general idea of "position
> linked" queries is possible using the same approach as the
> FieldMaskingSpanQuery...
>
>
> https://lucene.apache.org/java/3_4_0/api/core/org/apache/lucene/search/spans/FieldMaskingSpanQuery.html
> https://issues.apache.org/jira/browse/LUCENE-1494
>
> ..implementing something like this that would work with
> (Numeric)RangeQueries however would require some additional work, but it
> should certianly be doable -- i've suggested this before but no one has
> taken me up on it...
> http://markmail.org/search/?q=hoss+FieldMaskingSpanQuery
>
> If we take it as a given that you can do multiple ranges "at the same
> position", then you can imagine supporting all of your "regular" hours
> using just two fields ("open" and "close") by encoding the day+time of
> each range of open hours into them -- even if a store is open for multiple
> sets of ranges per day (ie: closed for siesta)...
>
>  open: mon_12_30, tue_12_30, wed_07_30, wed_3_30, ...
>  close: mon_20_00, tue_20_30, wed_12_30, wed_22_30, ...
>
> then asking for "stores open now and for the next 3 hours" on "wed" at
> "2:13PM" becomes a query for...
>
> sameposition(open:[* TO wed_14_13], close:[wed_17_13 TO *])
>
> For the special case part of your problem when there are certain dates
> that a store will be open atypical hours, i *think* that could be solved
> using some special docs and the new "join" QParser in a filter query...
>
>        https://wiki.apache.org/solr/Join
>
> imagine you have your "regular" docs with all the normal data about a
> store, and the open/close fields i describe above.  but in addition to
> those, for any store that you know is "closed on dec 25" or "only open
> 12:00-15:00 on Jan 01" you add an additional small doc encapsulating
> the information about the stores closures on that special date - so that
> each special case would be it's own doc, even if one store had 5 days
> where there was a special case...
>
>  specialdoc1:
>    store_id: 42
>    special_date: Dec-25
>    status: closed
>  specialdoc2:
>    store_id: 42
>    special_date: Jan-01
>    status: irregular
>    open: 09_30
>    close: 13_00
>
> then when you are executing your query, you use an "fq" to constrain to
> stores that are (normally) open right now (like i mentioned above) and you
> use another fq to find all docs *except* those resulting from a join
> against these special case docs based on the current date.
>
> so if you r query is "open now and for the next 3 hours" and "now" ==
> "sunday, 2011-12-25 @ 10:17AM your query would be something like...
>
> q=...user input...
> time=sameposition(open:[* TO sun_10_17], close:[sun_13_17 TO *])
> fq={!v=time}
> fq={!join from=store_id to=unique_key v=$vv}
> vv=-(+special_date:Dec-25 +(status:closed OR _query_:"{v=$time}"))
>
> That join based approach for dealing with the special dates should work
> regardless of wether someone implements a way to do pair wise
> "sameposition()" rangequeries ... so if you can live w/o the multiple
> open/close pairs per day, you can just use the "one field per day of hte
> week" type approach you mentioned combined with the "join" for special
> case days of hte year and everything you need should already work w/o any
> code (on trunk).
>
> (disclaimer: obviously i haven't tested that query, the exact syntax may
> be off but the princible for modeling the "special docs" and using
> them in a join should work)
>
>
> -Hoss
>



-- 
Sincerely yours
Mikhail (Mike) Khludnev
Developer
Grid Dynamics
tel. 1-415-738-8644
Skype: mkhludnev
<http://www.griddynamics.com>
 <mk...@griddynamics.com>

Re: multiple dateranges/timeslots per doc: modeling openinghours.

Posted by Chris Hostetter <ho...@fucit.org>.

: Another, faulty, option would be to model opening/closing hours in 2
: multivalued date-fields, i.e: open, close. and insert open/close for each
: day, e.g: 
: 
: open: 2011-11-08:1800 - close: 2011-11-09:0300
: open: 2011-11-09:1700 - close: 2011-11-10:0500
: open: 2011-11-10:1700 - close: 2011-11-11:0300
: 
: And queries would be of the form:
: 
: 'open < now && close > now+3h'
: 
: But since there is no way to indicate that 'open' and 'close' are pairwise
: related I will get a lot of false positives, e.g the above document would be
: returned for:

This isn't possible out of the box, but the general idea of "position 
linked" queries is possible using the same approach as the 
FieldMaskingSpanQuery...

https://lucene.apache.org/java/3_4_0/api/core/org/apache/lucene/search/spans/FieldMaskingSpanQuery.html
https://issues.apache.org/jira/browse/LUCENE-1494

..implementing something like this that would work with 
(Numeric)RangeQueries however would require some additional work, but it 
should certianly be doable -- i've suggested this before but no one has 
taken me up on it...
http://markmail.org/search/?q=hoss+FieldMaskingSpanQuery

If we take it as a given that you can do multiple ranges "at the same 
position", then you can imagine supporting all of your "regular" hours 
using just two fields ("open" and "close") by encoding the day+time of 
each range of open hours into them -- even if a store is open for multiple 
sets of ranges per day (ie: closed for siesta)...

  open: mon_12_30, tue_12_30, wed_07_30, wed_3_30, ...
  close: mon_20_00, tue_20_30, wed_12_30, wed_22_30, ...

then asking for "stores open now and for the next 3 hours" on "wed" at 
"2:13PM" becomes a query for...

sameposition(open:[* TO wed_14_13], close:[wed_17_13 TO *])

For the special case part of your problem when there are certain dates 
that a store will be open atypical hours, i *think* that could be solved 
using some special docs and the new "join" QParser in a filter query...

	https://wiki.apache.org/solr/Join

imagine you have your "regular" docs with all the normal data about a 
store, and the open/close fields i describe above.  but in addition to 
those, for any store that you know is "closed on dec 25" or "only open 
12:00-15:00 on Jan 01" you add an additional small doc encapsulating 
the information about the stores closures on that special date - so that 
each special case would be it's own doc, even if one store had 5 days 
where there was a special case...

  specialdoc1:
    store_id: 42
    special_date: Dec-25
    status: closed
  specialdoc2:
    store_id: 42
    special_date: Jan-01
    status: irregular
    open: 09_30
    close: 13_00

then when you are executing your query, you use an "fq" to constrain to 
stores that are (normally) open right now (like i mentioned above) and you 
use another fq to find all docs *except* those resulting from a join 
against these special case docs based on the current date.

so if you r query is "open now and for the next 3 hours" and "now" == 
"sunday, 2011-12-25 @ 10:17AM your query would be something like...

q=...user input...
time=sameposition(open:[* TO sun_10_17], close:[sun_13_17 TO *])
fq={!v=time}
fq={!join from=store_id to=unique_key v=$vv}
vv=-(+special_date:Dec-25 +(status:closed OR _query_:"{v=$time}"))

That join based approach for dealing with the special dates should work 
regardless of wether someone implements a way to do pair wise 
"sameposition()" rangequeries ... so if you can live w/o the multiple 
open/close pairs per day, you can just use the "one field per day of hte 
week" type approach you mentioned combined with the "join" for special 
case days of hte year and everything you need should already work w/o any 
code (on trunk).

(disclaimer: obviously i haven't tested that query, the exact syntax may 
be off but the princible for modeling the "special docs" and using 
them in a join should work)


-Hoss