You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Christopher Schultz <ch...@christopherschultz.net> on 2018/08/16 13:48:45 UTC

Searching by dates

All,

My understanding is that Solr (really Lucene) only handles temporal data
using full timestamps (date+time, always UTC). I have a use-case where
I'd like to store and search for people by their birth dates, so the
timestamp information is not relevant for me.

I haven't actually tried this, yes, but from the docs I'm guessing that
I can't search for a DOB using e.g. 2018-08-16 but instead I need to
search using 2018-08-16T00:00:00 plus maybe "Z" at the end for the TZ.

No user is ever going to do that.

I can also offer a separate form-field for "enter your DOB search here"
and then correctly-format it for Solr/Lucene, but then users can't
conveniently search for e.g. "chris schultz 2018-08-16" and have the DOB
match anything useful.

Is there any standard way of handling dates, or any ideas people have
come up with that kind of work for this use-case?

I could always convert dates to unparsed strings (so I don't get
separate tokens like 2018, 08, and 16 in the document), but then I won't
be able to do range queries against the index.

I would definitely want to be able to search for "chris [born in] august
2018" and find any matches.

Any ideas?

Thanks
-chris


Re: Searching by dates

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
However, you probably will still need to convert your dates into
strings as well to match people's search expectation, as the date
fields do not store _english_ month names internally.

So, you will want to have a secondary field that expands 2018-02-31
into "February 2018" (and "Feb 2018"?) including the analysis pipeline
that does lowercasing.

Regards,
   Alex.

On 16 August 2018 at 10:37, Shawn Heisey <ap...@elyograg.org> wrote:
> On 8/16/2018 7:48 AM, Christopher Schultz wrote:
>> I haven't actually tried this, yes, but from the docs I'm guessing that
>> I can't search for a DOB using e.g. 2018-08-16 but instead I need to
>> search using 2018-08-16T00:00:00 plus maybe "Z" at the end for the TZ.
>>
>> No user is ever going to do that.
>
> If you use the field class called DateRangeField, instead of the trie or
> point classes, you can get what you're after.
>
> It allows both searching and indexing dates as vague as "2018".
>
> https://lucene.apache.org/solr/guide/7_4/working-with-dates.html
>
> For an existing index, you will have to change the schema and completely
> reindex.
>
> Thanks,
> Shawn
>

Re: Searching by dates

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
You could have PatternReplace in your field definition either as a
CharFilter or a TokenFilter. See:
http://www.solr-start.com/info/analyzers/

Regards,
   Alex.

On 16 August 2018 at 11:20, Christopher Schultz
<ch...@christopherschultz.net> wrote:
> Shawn,
>
> On 8/16/18 10:37 AM, Shawn Heisey wrote:
>> On 8/16/2018 7:48 AM, Christopher Schultz wrote:
>>> I haven't actually tried this, yes, but from the docs I'm guessing that
>>> I can't search for a DOB using e.g. 2018-08-16 but instead I need to
>>> search using 2018-08-16T00:00:00 plus maybe "Z" at the end for the TZ.
>>>
>>> No user is ever going to do that.
>>
>> If you use the field class called DateRangeField, instead of the trie or
>> point classes, you can get what you're after.
>>
>> It allows both searching and indexing dates as vague as "2018".
>>
>> https://lucene.apache.org/solr/guide/7_4/working-with-dates.html
>
> Hmm. I could have sworn the documentation I read in the past (maybe as
> long as 3-4 months ago) indicated that date+timestamp was necessary.
> Maybe that was just for the index, while the searches can be partial.
>
> As long as users don't have to enter timestamps to search, I think all
> is well in terms of index/search for me.
>
> As for i18n, is there a way to have the query analyzer convert strings
> like "mm/dd/yyyy" into "yyyy-mm-dd"?
>
> I'm sure we can take the query (before handing-off to Solr), look for
> anything that looks like a date and convert it into ISO-8601 for
> searching, but if Solr already provides a facility to do that, I'd
> rather not complicate my code in order to get it working.
>
>> For an existing index, you will have to change the schema and completely
>> reindex.
>
> That's okay. The index doesn't actually exist, yet :) This is all just
> planning.
>
> Thanks,
> -chris
>

Re: Searching by dates

Posted by Shawn Heisey <ap...@elyograg.org>.
On 8/16/2018 9:20 AM, Christopher Schultz wrote:
> Hmm. I could have sworn the documentation I read in the past (maybe as
> long as 3-4 months ago) indicated that date+timestamp was necessary.
> Maybe that was just for the index, while the searches can be partial.

DateRangeField was introduced four years ago, first available in Solr
version 5.0.

https://issues.apache.org/jira/browse/SOLR-6103

> As for i18n, is there a way to have the query analyzer convert strings
> like "mm/dd/yyyy" into "yyyy-mm-dd"?

Solr doesn't accept dates in mm/dd/yyyy syntax, and can't convert that
for you.  The ISO standard that *is* accepted is the more logical
yyyy-mm-dd.  It's generally best if you don't use a freeform text field
for dates ... provide a full interface for choosing specific dates so
that user input is predictable.  Probably something like this:

https://developer.mozilla.org/en-US/docs/Web/HTML/Element/input/date

Looking at the documentation, I don't see any way to search for just a
day without the year.  That could be a useful enhancement for
birthday-related use cases, but I have no idea how hard it would be to
write.

Thanks,
Shawn


Re: Searching by dates

Posted by Christopher Schultz <ch...@christopherschultz.net>.
Shawn,

On 8/16/18 10:37 AM, Shawn Heisey wrote:
> On 8/16/2018 7:48 AM, Christopher Schultz wrote:
>> I haven't actually tried this, yes, but from the docs I'm guessing that
>> I can't search for a DOB using e.g. 2018-08-16 but instead I need to
>> search using 2018-08-16T00:00:00 plus maybe "Z" at the end for the TZ.
>>
>> No user is ever going to do that.
> 
> If you use the field class called DateRangeField, instead of the trie or
> point classes, you can get what you're after.
> 
> It allows both searching and indexing dates as vague as "2018".
> 
> https://lucene.apache.org/solr/guide/7_4/working-with-dates.html

Hmm. I could have sworn the documentation I read in the past (maybe as
long as 3-4 months ago) indicated that date+timestamp was necessary.
Maybe that was just for the index, while the searches can be partial.

As long as users don't have to enter timestamps to search, I think all
is well in terms of index/search for me.

As for i18n, is there a way to have the query analyzer convert strings
like "mm/dd/yyyy" into "yyyy-mm-dd"?

I'm sure we can take the query (before handing-off to Solr), look for
anything that looks like a date and convert it into ISO-8601 for
searching, but if Solr already provides a facility to do that, I'd
rather not complicate my code in order to get it working.

> For an existing index, you will have to change the schema and completely
> reindex.

That's okay. The index doesn't actually exist, yet :) This is all just
planning.

Thanks,
-chris


Re: Searching by dates

Posted by Shawn Heisey <ap...@elyograg.org>.
On 8/16/2018 7:48 AM, Christopher Schultz wrote:
> I haven't actually tried this, yes, but from the docs I'm guessing that
> I can't search for a DOB using e.g. 2018-08-16 but instead I need to
> search using 2018-08-16T00:00:00 plus maybe "Z" at the end for the TZ.
>
> No user is ever going to do that.

If you use the field class called DateRangeField, instead of the trie or
point classes, you can get what you're after.

It allows both searching and indexing dates as vague as "2018".

https://lucene.apache.org/solr/guide/7_4/working-with-dates.html

For an existing index, you will have to change the schema and completely
reindex.

Thanks,
Shawn