You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Philippe Ombredanne <po...@nexb.com> on 2004/03/12 20:40:48 UTC

An InvertedDateField implementation : comments?

Dear Lucene enthusiasts,
Following the discussion thread on backward terms enumeration,
http://issues.apache.org/eyebrowse/ReadMsg?listName=lucene-user@jakarta.
apache.org&msgId=1381575, I have created a draft inverted date field
simple implementation.

I need it badly as we manage versions of documents with Lucene, and that
is a cool way to get to the latest version.

The route I took is to take the complement to the implied MAX_DATE from
DateField as the largest long that can be managed on 9 bytes(DATE_LEN)
that means zzzzzzzzz parsed to long, and as a date sometimes in April
5188.....(;-)

You can still keep a somewhat compact date (9 bytes), and get the right
ordering.
Things that may need additional work for a fully functional inverted
date feature:
- An InvertedDateFilter,
- updates to QueryParser and Field to manage inverted dates and inverted
date ranges, and semantics like "Most recent document"

I have attached my draft implementation of an InvertedDateField helper
class, delegating most of its work to DateField, and dealing properly
with inversions conversions to and from inverted date Strings, to and
from Dates, times, and inverted times.
I still keep the constraints/limitations of DateField, but manage the
inversion.
By the way, why is DateField.DATE_LEN private?

I have also added a simple test class.

I am donating this code and assign copyright to the ASF as a
contribution.
What about making it part of the base package?

Comments on the approach are badly needed.
Another approach was that posted by Matt Quail is to manage a LongField.
See below.
Cheers
Philippe

-----Original Message-----
From: Matt Quail [mailto:matt@ctx.com.au] 
Sent: Saturday, February 28, 2004 8:17 PM
To: Philippe Ombredanne
Subject: Re: Iterating TermEnum backwards - looking for the inverted
date


Philippe,

Your class certainly looks adequate. While you were implementing yours, 
I implemented a full "LongField" class :D

I posted it here:
http://issues.apache.org/eyebrowse/ReadMsg?listName=lucene-dev@jakarta.a
pache.org&msgNo=5329

It correctly encodes all long values (including negative values). This 
means to store an "inverted" date, you can just make it negative. Long 
values are encoded into a string 14 characters long.

I handle the '-' case by prefixing positive numbers with a character 
greater than '-' (I use '0').

=Matt