You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Erik Fäßler <er...@uni-jena.de> on 2011/10/29 15:12:04 UTC

Uncomplete date expressions

Hi all,

I want to index MEDLINE documents which not always contain complete dates of publication. The year is known always. Now the Solr documentation states, dates must have the format "1995-12-31T23:59:59Z" for which month, day and even the time of the day must be known.
I could, of course, just complement uncomplete dates with default values, 01-01 for example. But then I won't be able to distinguish between complete and uncomplete dates afterwards which is of importance when displaying the documents.

I could just store the known information, e.g. the year, into an integer-typed field, but then I won't have date math.

Is there a good solution to my problem? Probably I'm just missing the obvious, perhaps you can help me :-)

Best regards,

	Erik

Re: Uncomplete date expressions

Posted by Chris Hostetter <ho...@fucit.org>.
: But Solr is (intentionally) stupid about dates, and
: requires the (almost) full date format. There are

I'm not sure how i feel about "intentionally stupid" ... but the 
underlying sentiment is correct: Solr requires clients to be *VERY* 
explicit about dates, because that way the client is in control of all the 
hard decisions like:

 * how should partial dates sort?
 * how should date math be applied to partial dates?
 * when should a partial date be considered part of a range query?
 * when should a date be considered part of a range query where the 
endpoints are partial?

In many cases, what people with "fuzzy" dates are conceptually dealing 
with should best be thought of not as specific "moments" (which is what 
the date field encapsulates) but as "events" consisting of multiple 
dates: start+end, or maybe start+effective+end.

So is you know *exactly* when something happened, you index it as both the 
start & end fields.  but if you only know that something happened in 1969, 
you index start=1969-01-01 & end=1969-12-31 & effective= ... whatever you 
want (Jan 1? Jun 1? Dec 31?, no value at all?).  then if someone seraches 
for "anything that happened during my lifetime: 1969-05-13T00:00:00 TO 
NOW" you can make the decision at query time wether to use the start, end, 
or effective field in your query (where hte right decision depends 
entirely on the nature of your data and your UI)


-Hoss

Re: Uncomplete date expressions

Posted by Erick Erickson <er...@gmail.com>.
Well, if Solr understood partial dates, how would you then know
whether the original was partial or not? It would all look the
same when you pulled it out...

But Solr is (intentionally) stupid about dates, and
requires the (almost) full date format. There are
a few zeros you can leave off, but not many...
And certainly just the year isn't supported.

You can *store* the original input, but *index*
(and search and range and facet) on the normalized
date, so your display for the end user is just the
stored form.

Best
Erick

2011/10/29 Erik Fäßler <er...@uni-jena.de>:
> Hello François,
>
> thank you for your quick reply. I thought about just storing which information I am lacking and this would be a possibility of course. It just seemed a bit like quick&dirty to me and I wondered whether Solr really cannot understand dates which only consist of the year. Isn't it a common case that a date/time expression is not determined to the hour, for example? But if there is no other possibility I will stick with your suggestion, thank you!
>
> Best,
>
>        Erik
>
> Am 29.10.2011 um 15:20 schrieb François Schiettecatte:
>
>> Erik
>>
>> I would complement the date with default values as you suggest and store a boolean flag indicating whether the date was complete or not, or store the original date if it is not complete which would probably be better because the presence of that data would tell you that the original date was not complete and you would also have it too.
>>
>> Cheers
>>
>> François
>>
>> On Oct 29, 2011, at 9:12 AM, Erik Fäßler wrote:
>>
>>> Hi all,
>>>
>>> I want to index MEDLINE documents which not always contain complete dates of publication. The year is known always. Now the Solr documentation states, dates must have the format "1995-12-31T23:59:59Z" for which month, day and even the time of the day must be known.
>>> I could, of course, just complement uncomplete dates with default values, 01-01 for example. But then I won't be able to distinguish between complete and uncomplete dates afterwards which is of importance when displaying the documents.
>>>
>>> I could just store the known information, e.g. the year, into an integer-typed field, but then I won't have date math.
>>>
>>> Is there a good solution to my problem? Probably I'm just missing the obvious, perhaps you can help me :-)
>>>
>>> Best regards,
>>>
>>>      Erik
>>
>
>

Re: Uncomplete date expressions

Posted by Erik Fäßler <er...@uni-jena.de>.
Hello François,

thank you for your quick reply. I thought about just storing which information I am lacking and this would be a possibility of course. It just seemed a bit like quick&dirty to me and I wondered whether Solr really cannot understand dates which only consist of the year. Isn't it a common case that a date/time expression is not determined to the hour, for example? But if there is no other possibility I will stick with your suggestion, thank you!

Best,

	Erik

Am 29.10.2011 um 15:20 schrieb François Schiettecatte:

> Erik
> 
> I would complement the date with default values as you suggest and store a boolean flag indicating whether the date was complete or not, or store the original date if it is not complete which would probably be better because the presence of that data would tell you that the original date was not complete and you would also have it too.
> 
> Cheers
> 
> François
> 
> On Oct 29, 2011, at 9:12 AM, Erik Fäßler wrote:
> 
>> Hi all,
>> 
>> I want to index MEDLINE documents which not always contain complete dates of publication. The year is known always. Now the Solr documentation states, dates must have the format "1995-12-31T23:59:59Z" for which month, day and even the time of the day must be known.
>> I could, of course, just complement uncomplete dates with default values, 01-01 for example. But then I won't be able to distinguish between complete and uncomplete dates afterwards which is of importance when displaying the documents.
>> 
>> I could just store the known information, e.g. the year, into an integer-typed field, but then I won't have date math.
>> 
>> Is there a good solution to my problem? Probably I'm just missing the obvious, perhaps you can help me :-)
>> 
>> Best regards,
>> 
>> 	Erik
> 


Re: Uncomplete date expressions

Posted by François Schiettecatte <fs...@gmail.com>.
Erik

I would complement the date with default values as you suggest and store a boolean flag indicating whether the date was complete or not, or store the original date if it is not complete which would probably be better because the presence of that data would tell you that the original date was not complete and you would also have it too.

Cheers

François

On Oct 29, 2011, at 9:12 AM, Erik Fäßler wrote:

> Hi all,
> 
> I want to index MEDLINE documents which not always contain complete dates of publication. The year is known always. Now the Solr documentation states, dates must have the format "1995-12-31T23:59:59Z" for which month, day and even the time of the day must be known.
> I could, of course, just complement uncomplete dates with default values, 01-01 for example. But then I won't be able to distinguish between complete and uncomplete dates afterwards which is of importance when displaying the documents.
> 
> I could just store the known information, e.g. the year, into an integer-typed field, but then I won't have date math.
> 
> Is there a good solution to my problem? Probably I'm just missing the obvious, perhaps you can help me :-)
> 
> Best regards,
> 
> 	Erik