You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Anatoli Matuskova <an...@gmail.com> on 2011/10/27 12:06:54 UTC

Search calendar avaliability

hello,
I want to filter search by calendar availability. For each document I know
the days which it is not available.
How could I build my fields filter the documents that are available in a
range of dates?
For example, a document A is available from 1-9-2011 to 5-9-2011 and is
available from 17-9-2011 to 22-9-2011 too (it's no available in the gap in
between)
If the filter query asks for avaliables from 2-9-2011 to 4-9-2011 docA would
be a match.
If the filter query for avaliables from 2-9-2011 to 20-9-2011 docA wouldn't
be a match as even the start and end are avaliables there's a gap of no
avaliability between them.
is this possible with Solr?

--
View this message in context: http://lucene.472066.n3.nabble.com/Search-calendar-avaliability-tp3457203p3457203.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Search calendar avaliability

Posted by Anatoli Matuskova <an...@gmail.com>.
> What does a lot mean?  How high is the sky? 
If I have 3 milion docs I would end up with 3 milion * days avaliable

> This can be done.  And given that you want long stretches of availability, 
> but what happens when a reservation is canceled?  You have to coalesce 
> intervals.  That isn't impossible, but it is a pain. 
>
> Would this count as premature optimization? 

I always build the index from scratch indexing from an external datasource,
getting the avaliability from there (and all the other data from a document)

> If you want to drive down to a resolution of seconds, the document time
> slot 
> model doesn't work.  But for days, it probably does. 

yes, the avaliability is defined per days, not per seconds.

I'm trying to find the way to make this perform as better as possible.
I've found this and it's interesting too:
https://issues.apache.org/jira/browse/SOLR-1913
But the only way I see to use it is generate dinamic fields per month and
filter using them. The problem here is that for each month I want to filter
a search request, I would have to load a FieldCache.getInts and will quickly
run OOM.



--
View this message in context: http://lucene.472066.n3.nabble.com/Search-calendar-avaliability-tp3457203p3457899.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Search calendar avaliability

Posted by Ted Dunning <te...@gmail.com>.
On Thu, Oct 27, 2011 at 7:13 AM, Anatoli Matuskova <
anatoli.matuskova@gmail.com> wrote:

> I don't like the idea of indexing a doc per each value, the dataset can
> grow
> a lot.


What does a lot mean?  How high is the sky?

A million people with 3 year schedules is a billion tiny documents.

That doesn't sound like such an enormous number.


> I have thought that something like this could work:
> At indexing time, if I know the dates of no avaliability, I could gather
> the
> avaliability ones (will consider unknown as available). So, I index 4
> fields
> aval_yes_start, aval_yes_end, aval_no_start, aval_no_end (all are
> multiValued)
> If the user ask for avaliability from $start to $end I filter like:
>
> fq=aval_yes_start:[$start TO $end]&fq=aval_yes_end:[$start TO
> $end]&fq=*-*aval_no_start:[$start TO $end]&fq=*-*aval_no_end:[$start TO
> $end]
>

This can be done.  And given that you want long stretches of availability,
but what happens when a reservation is canceled?  You have to coalesce
intervals.  That isn't impossible, but it is a pain.

Would this count as premature optimization?

Simply retrieving days in the range and counting gets the right answer a bit
more simply.  Additions and deletions and modifications all work.

If you want to drive down to a resolution of seconds, the document time slot
model doesn't work.  But for days, it probably does.

Re: Search calendar avaliability

Posted by Anatoli Matuskova <an...@gmail.com>.
I don't like the idea of indexing a doc per each value, the dataset can grow
a lot. I have thought that something like this could work:
At indexing time, if I know the dates of no avaliability, I could gather the
avaliability ones (will consider unknown as available). So, I index 4 fields
aval_yes_start, aval_yes_end, aval_no_start, aval_no_end (all are
multiValued)
If the user ask for avaliability from $start to $end I filter like:

fq=aval_yes_start:[$start TO $end]&fq=aval_yes_end:[$start TO
$end]&fq=*-*aval_no_start:[$start TO $end]&fq=*-*aval_no_end:[$start TO
$end]

This way I make sure start date is available, end dates too and there are no
unavaliable gaps in between.
As I save ranges and no concrete days the number of multiValued shouldn't
grow a lot and using trie fields I think these range queries should be fast.

Any better idea?
 

--
View this message in context: http://lucene.472066.n3.nabble.com/Search-calendar-avaliability-tp3457203p3457810.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Search calendar avaliability

Posted by lee carroll <le...@googlemail.com>.
do your docs have daily availability ?
if so you could index each doc for each day (rather than have some
logic embedded in your data)

so instead of doc1 (1/9/2011 - 5/9/2011)
you have
doc1 1/9/2011
doc1 2/9/2011
doc1 3/9/2011
doc1 4/9/2011
doc1 5/9/2011

this makes search much easier and flexible. If needed you can collapse
on doc id if you need to present to the user at doc level.
or group of date even.

The problem you have is because you have logic and data in a field,
get rid of the logic and just store the data.

Cheers Lee c


On 27 October 2011 12:36, Per Newgro <pe...@gmx.ch> wrote:
> what you is looking for is imho not releated to solr in special.
> The topic should be solr as "temporal database".
> In your case if you have a timeline from 0 to 10 and you have two
> documents from 1 to 6 and 5 to 13 you can get all documents within 0 - 10
> by quering document.end >= 0 and document.start <= 10.
> The greater or less equal depends on your definition of outside and inside
> the interval. But beware the exchanged fields end and start.
>
> Hth
> Per
>
> Am 27.10.2011 12:06, schrieb Anatoli Matuskova:
>>
>> hello,
>> I want to filter search by calendar availability. For each document I know
>> the days which it is not available.
>> How could I build my fields filter the documents that are available in a
>> range of dates?
>> For example, a document A is available from 1-9-2011 to 5-9-2011 and is
>> available from 17-9-2011 to 22-9-2011 too (it's no available in the gap in
>> between)
>> If the filter query asks for avaliables from 2-9-2011 to 4-9-2011 docA
>> would
>> be a match.
>> If the filter query for avaliables from 2-9-2011 to 20-9-2011 docA
>> wouldn't
>> be a match as even the start and end are avaliables there's a gap of no
>> avaliability between them.
>> is this possible with Solr?
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Search-calendar-avaliability-tp3457203p3457203.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>

Re: Search calendar avaliability

Posted by Per Newgro <pe...@gmx.ch>.
what you is looking for is imho not releated to solr in special.
The topic should be solr as "temporal database".
In your case if you have a timeline from 0 to 10 and you have two
documents from 1 to 6 and 5 to 13 you can get all documents within 0 - 10
by quering document.end >= 0 and document.start <= 10.
The greater or less equal depends on your definition of outside and inside
the interval. But beware the exchanged fields end and start.

Hth
Per

Am 27.10.2011 12:06, schrieb Anatoli Matuskova:
> hello,
> I want to filter search by calendar availability. For each document I know
> the days which it is not available.
> How could I build my fields filter the documents that are available in a
> range of dates?
> For example, a document A is available from 1-9-2011 to 5-9-2011 and is
> available from 17-9-2011 to 22-9-2011 too (it's no available in the gap in
> between)
> If the filter query asks for avaliables from 2-9-2011 to 4-9-2011 docA would
> be a match.
> If the filter query for avaliables from 2-9-2011 to 20-9-2011 docA wouldn't
> be a match as even the start and end are avaliables there's a gap of no
> avaliability between them.
> is this possible with Solr?
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Search-calendar-avaliability-tp3457203p3457203.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>