You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by James Levine <jd...@gmail.com> on 2005/04/21 23:22:23 UTC

sorting on "dates" a little fuzzy...

I have an index of around 3 million records, and typical queries
can result in result sets of between 1 and 400,000 results. 

We have indexed "dateTime" fields in the form 20050415142, that is, to
10-minute precision.

When I try to sort queries I get something back that is roughly sorted
on index, but not quite. Stuff is out of order just a bit. The
size of the result set does not seem to be related occurance of
this problem.

We've tried lucene 1.4-final and1.4.3.

my code looks like this

s = new Sort( new SortField[] { new SortField( "dateTime", SortField.STRING, 
true ), SortField.FIELD_SCORE } );

...

hits = searcher.search( qry, s );


Any help is appreciated, I'm so far baffled by this problem.

Regards,
James

Re: sorting on "dates" a little fuzzy...

Posted by Che Dong <ch...@chedong.com>.
Just like Google said: full text search service is not traditional 
database application. Lucene is not a database too: if you wanna sort on 
some fields, you'd better pre-sort it before it indexed: like date. then 
get results by doc id.

For lucene you can only sort results in top hits. if you sort 400k 
result hits by date: you lost the speed of Lucene.


Thanks

Che Dong
http://www.chedong.com/

Erik Hatcher 写道:
> 
> On Apr 21, 2005, at 5:22 PM, James Levine wrote:
> 
>> I have an index of around 3 million records, and typical queries
>> can result in result sets of between 1 and 400,000 results.
>>
>> We have indexed "dateTime" fields in the form 20050415142, that is, to
>> 10-minute precision.
>>
>> When I try to sort queries I get something back that is roughly sorted
>> on index, but not quite. Stuff is out of order just a bit. The
>> size of the result set does not seem to be related occurance of
>> this problem.
>>
>> We've tried lucene 1.4-final and1.4.3.
>>
>> my code looks like this
>>
>> s = new Sort( new SortField[] { new SortField( "dateTime", 
>> SortField.STRING,
>> true ), SortField.FIELD_SCORE } );
>>
>> ...
>>
>> hits = searcher.search( qry, s );
>>
>>
>> Any help is appreciated, I'm so far baffled by this problem.
> 
> 
> I don't have a solution, but rather some questions to check.... are all 
> dateTime's the same width, zero padded on the right?  Does every 
> document have a dateTime field?
> 
> I recommend you sort with type INT instead of STRING if it fits, or 
> maybe LONG.  STRING will use the most resources for sorting.
> 
>     Erik
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: sorting on "dates" a little fuzzy...

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Apr 21, 2005, at 5:22 PM, James Levine wrote:

> I have an index of around 3 million records, and typical queries
> can result in result sets of between 1 and 400,000 results.
>
> We have indexed "dateTime" fields in the form 20050415142, that is, to
> 10-minute precision.
>
> When I try to sort queries I get something back that is roughly sorted
> on index, but not quite. Stuff is out of order just a bit. The
> size of the result set does not seem to be related occurance of
> this problem.
>
> We've tried lucene 1.4-final and1.4.3.
>
> my code looks like this
>
> s = new Sort( new SortField[] { new SortField( "dateTime", 
> SortField.STRING,
> true ), SortField.FIELD_SCORE } );
>
> ...
>
> hits = searcher.search( qry, s );
>
>
> Any help is appreciated, I'm so far baffled by this problem.

I don't have a solution, but rather some questions to check.... are all 
dateTime's the same width, zero padded on the right?  Does every 
document have a dateTime field?

I recommend you sort with type INT instead of STRING if it fits, or 
maybe LONG.  STRING will use the most resources for sorting.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org