You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "David Smiley (JIRA)" <ji...@apache.org> on 2014/05/13 01:00:16 UTC

[jira] [Updated] (LUCENE-5648) Index/search multi-valued time durations

     [ https://issues.apache.org/jira/browse/LUCENE-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Smiley updated LUCENE-5648:
---------------------------------

    Attachment: LUCENE-5648.patch

Here it is; including tests.
* Works with all main predicates: Intersects, IsWithin, Contains, IsDisjointTo
* The implementation is split into the core, a NumberRangePrefixTree and knows nothing about calendars, and then a DateRangePrefixTree subclass which just has the calendaring specifics.
* Is able to work with any java.util.Calendar passed to it, including those initialized with Long.MIN_VALUE or MAX_VALUE.  Care is taken to check & avoid Calendar/Long underflow.
* Optimized calculation for dates after the "Gregorian Change Date" October 15th 1582, in which I basically need to check for leap years & that's it.  Earlier dates use Calendar directly with more overhead but will likely make this work with a variety of Calendars.
* toString() for a cell will use ISO-8601 output, including putting a leading "-" if it's 2BC or before.  1BC is actually the year "0000".  "*" means the universe / all-time.  There is no date parsing in this patch; that is going to happen in a subsequent Solr FieldType.  I might end up moving the code down here for convenience of non-Solr users though.
* The year range is divided into intermediate levels -- there are 1 million year intervals and 1 thousand year intervals.  They are aligned at year 0000 (the year before 1AD).

It uses the changes to the SpatialPrefixTree API in LUCENE-5608 so it's still limited to trunk for now. I want to make some more changes to that API still, before eventually back-porting it all to 4x.

The patch references some changes in the various filters, which theoretically wouldn't have to be modified for new SPTs.  It's pretty much just comments, and limiting an over-aggressive assertion that couldn't universally hold.

> Index/search multi-valued time durations
> ----------------------------------------
>
>                 Key: LUCENE-5648
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5648
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/spatial
>            Reporter: David Smiley
>            Assignee: David Smiley
>         Attachments: LUCENE-5648.patch
>
>
> If you need to index a date/time duration, then the way to do that is to have a pair of date fields; one for the start and one for the end -- pretty straight-forward. But if you need to index a variable number of durations per document, then the options aren't pretty, ranging from denormalization, to joins, to using Lucene spatial with 2D as described [here|http://wiki.apache.org/solr/SpatialForTimeDurations].  Ideally it would be easier to index durations, and work in a more optimal way.
> This issue implements the aforementioned feature using Lucene-spatial with a new single-dimensional SpatialPrefixTree implementation. Unlike the other two SPT implementations, it's not based on floating point numbers. It will have a Date based customization that indexes levels at meaningful quantities like seconds, minutes, hours, etc.  The point of that alignment is to make it faster to query across meaningful ranges (i.e. [2000 TO 2014]) and to enable a follow-on issue to facet on the data in a really fast way.
> I'll expect to have a working patch up this week.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org