You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@lucene.apache.org by Scott Tiger <m....@gmail.com> on 2007/09/06 19:43:07 UTC

performance of sorting by date

I want to search document by sorting datetime field mainly.
Which implementation is the best for sorting performance.

1. index the datetime as one field.

fields: title, contents, datetime

In this case, when there are documents that the datefield increases by 1
second between first of 2007 and end of 2007, the number of Term becomes
about 31536000 (seconds in 365 * 24 * 60 * 60).

2. index the datetime as 6 fields.

fields: title, contents, year, month, day, hour, minute, second.

In this case, Term of each field is,
 year : 1
 month : 12
 day : 31
 hour : 24
 minute : 60
 second : 60
totally, 188 terms. But sorting needs 6 field.

sample code:
String[] sortFields = { "year", "month", "day", "hour", "minute", "second"
};
Sort sort = new Sort(sortFields);
Hits hits = searcher.search(query, sort);

(3rd approach, index datetime as 2 field, yyyymmdd and hhmmss.)

I also need periodically about 1-10 minutes reopen index (add/delete
documents).


Thanks.

Re: performance of sorting by date

Posted by Scott Tiger <m....@gmail.com>.
I just have tested this case my self.

> 1. index the datetime as one field.

In this case, first query (not from cache) is very slow response. it seems
that FieldCache is too big.
2nd query is very fast. it seems to be cached.
And I can not use RangeQuery because of too many clauses.

ie. datetime:[20070101000000 TO 20071231235959]
it's contains 31536000 terms.

> 2. index the datetime as 6 fields.

This is recommended. first query is not slow, so fast.
also 2nd query is very fast.
There are more advantages that I can use RangeQuery very fast.

ie. yyyy:[2007 TO 2007] AND mm:[1 TO 4]
it's contains only 5 terms.

I have 2,000,000 documents in index,
first query respond in about 1500ms.

Thanks.

2007/9/7, Scott Tiger <m....@gmail.com>:
>
> I want to search document by sorting datetime field mainly.
> Which implementation is the best for sorting performance.
>
> 1. index the datetime as one field.
>
> fields: title, contents, datetime
>
> In this case, when there are documents that the datefield increases by 1
> second between first of 2007 and end of 2007, the number of Term becomes
> about 31536000 (seconds in 365 * 24 * 60 * 60).
>
> 2. index the datetime as 6 fields.
>
> fields: title, contents, year, month, day, hour, minute, second.
>
> In this case, Term of each field is,
>  year : 1
>  month : 12
>  day : 31
>  hour : 24
>  minute : 60
>  second : 60
> totally, 188 terms. But sorting needs 6 field.
>
> sample code:
> String[] sortFields = { "year", "month", "day", "hour", "minute", "second"
> };
> Sort sort = new Sort(sortFields);
> Hits hits = searcher.search(query, sort);
>
> (3rd approach, index datetime as 2 field, yyyymmdd and hhmmss.)
>
> I also need periodically about 1-10 minutes reopen index (add/delete
> documents).
>
>
> Thanks.
>