You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Ganesh <em...@yahoo.co.in> on 2009/01/13 10:07:14 UTC

Best way to do date sort

I am indexing and storing date time with minute resolution. I need to do 
date range query and also do sorting on this field. I am having almost 30 
million records spread across 20 database.

option1:
To index the date time as string

option2:
To index date, hour and minute separately as number.

Which option will consume lesser memory?

Whether date range query will also load all data using FieldImpl?

Regards
Ganesh 

Send instant messages to your online friends http://in.messenger.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Best way to do date sort

Posted by Ganesh <em...@yahoo.co.in>.
I am having a million of documents per day to index. DateTime field is with minute resolution. 

How much memory i save by splitting this to multiple fields (one field containintg YYYYMMDD, one field with HH and one with MM). 

Could anyone provide me some calculation of calculating memory for sorting in case of multiple fields. I think fieldcache will load all unique terms and it will have a pointer to the all respective documents. If it so then splitting the date field will consume more memory.   

Regards
Ganesh

----- Original Message ----- 
From: "Erick Erickson" <er...@gmail.com>
To: <ja...@lucene.apache.org>
Sent: Tuesday, January 13, 2009 7:17 PM
Subject: Re: Best way to do date sort


> This question, along with many possible answers has been
> discussed many times, so there's a wealth of information
> in the searchable archive.
> 
> The short form is "it depends". Do you want to sort? In
> that case storing a single field will cost you when sorting.
> Store the coarsest granularity you can. Consider breaking
> up the date field (i.e. one field containintg YYYYMMDD,
> perhaps one field containing HHMM or even one field
> with HH and one with MM).
> 
> This kind of strategy will save you far more space than worrying
> about strings vs number. And will sort faster. etc.
> 
> Best
> Erick
> 
> On Tue, Jan 13, 2009 at 4:07 AM, Ganesh <em...@yahoo.co.in> wrote:
> 
>> I am indexing and storing date time with minute resolution. I need to do
>> date range query and also do sorting on this field. I am having almost 30
>> million records spread across 20 database.
>>
>> option1:
>> To index the date time as string
>>
>> option2:
>> To index date, hour and minute separately as number.
>>
>> Which option will consume lesser memory?
>>
>> Whether date range query will also load all data using FieldImpl?
>>
>> Regards
>> Ganesh
>> Send instant messages to your online friends http://in.messenger.yahoo.com
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

Re: Best way to do date sort

Posted by Erick Erickson <er...@gmail.com>.
This question, along with many possible answers has been
discussed many times, so there's a wealth of information
in the searchable archive.

The short form is "it depends". Do you want to sort? In
that case storing a single field will cost you when sorting.
Store the coarsest granularity you can. Consider breaking
up the date field (i.e. one field containintg YYYYMMDD,
perhaps one field containing HHMM or even one field
with HH and one with MM).

This kind of strategy will save you far more space than worrying
about strings vs number. And will sort faster. etc.

Best
Erick

On Tue, Jan 13, 2009 at 4:07 AM, Ganesh <em...@yahoo.co.in> wrote:

> I am indexing and storing date time with minute resolution. I need to do
> date range query and also do sorting on this field. I am having almost 30
> million records spread across 20 database.
>
> option1:
> To index the date time as string
>
> option2:
> To index date, hour and minute separately as number.
>
> Which option will consume lesser memory?
>
> Whether date range query will also load all data using FieldImpl?
>
> Regards
> Ganesh
> Send instant messages to your online friends http://in.messenger.yahoo.com
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Best way to do date sort

Posted by prabin meitei <pr...@gmail.com>.
As far as I have encountered the best and simplest option is to use date
time as string (yyyymmddHHmmss or yyyymmdd) as per your requirement.

Prabin
toostep.com

On Tue, Jan 13, 2009 at 2:37 PM, Ganesh <em...@yahoo.co.in> wrote:

> I am indexing and storing date time with minute resolution. I need to do
> date range query and also do sorting on this field. I am having almost 30
> million records spread across 20 database.
>
> option1:
> To index the date time as string
>
> option2:
> To index date, hour and minute separately as number.
>
> Which option will consume lesser memory?
>
> Whether date range query will also load all data using FieldImpl?
>
> Regards
> Ganesh
> Send instant messages to your online friends http://in.messenger.yahoo.com
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>