You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Michael J. Prichard" <mi...@mac.com> on 2006/07/26 15:47:51 UTC

Timestamps as milliseconds

I am working on indexing emails and have stored the data as 
milliseconds.  I was thinking of using a filter w/ my search that would 
only return the email in that data range.  I am currently indexing as 
follows:

doc.add(new Field("date", (String) itemContent.get("date").toString(), 
Field.Store.YES, Field.Index.UN_TOKENIZED));

does this look like a good approach to you all?

Thanks,
Michael


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Timestamps as milliseconds

Posted by Erick Erickson <er...@gmail.com>.
two ideas:

1> store a second field that contains the time resolution you need, and sort
by that. You can still search (quickly) by the day-resolution field.
2> If you KNOW that you are indexing the e-mails in time-order, then sorting
by doc_id will preserve the time ordering.

Erick

Re: Timestamps as milliseconds

Posted by Miles Barr <mi...@magpie.net>.
Michael J. Prichard wrote:

> I guess the more I think about it I don't really care about the 
> minutes in the initial.  All that matters is the date (i.e. 
> 2006-07-25).  The only thing I would need the time for would be for 
> sorting so I need to have that too.  Ideas?
>

Store as much detail as you need to sort by. For display purposes just 
use java.text.SimpleDateFormat to only show the date and not the time.



Miles

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Timestamps as milliseconds

Posted by Miles Barr <mi...@magpie.net>.
Erick Erickson wrote:

> As Miles said, use the DateTools (lucene) class with a DAY resolution.
> That'll give you a YYYYMMDD format, which won't blow your query with a
> "TooManyClauses" exception.......
>
> Remember that Lucene deals with strings, so you want to store things in
> easily-manipulated string format, often one that's suitable for 
> comparison.
> Which is what you want to do when you create a RangeQuery


I think this is the way to go, have one field at DAY resolution for the 
RangeQuery and one field at MILLISECOND resolution for sorting.

The reason you want the coarsest resolution possible for RangeQuerys is 
that it works the same way as any other query, it tries to match tokens. 
To do this it enumerates all possible values between the two end points, 
so if there's a week between the two dates and the resolution is DAY, 
that's just seven values it tries to match against. If the resolution is 
MILLISECOND it tries to match against 604,800,000 values.



Miles


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Timestamps as milliseconds

Posted by Erick Erickson <er...@gmail.com>.
As Miles said, use the DateTools (lucene) class with a DAY resolution.
That'll give you a YYYYMMDD format, which won't blow your query with a
"TooManyClauses" exception.......

Remember that Lucene deals with strings, so you want to store things in
easily-manipulated string format, often one that's suitable for comparison.
Which is what you want to do when you create a RangeQuery

Erick

Re: Timestamps as milliseconds

Posted by "Michael J. Prichard" <mi...@mac.com>.
Michael J. Prichard wrote:

> Miles Barr wrote:
>
>> Michael J. Prichard wrote:
>>
>>> I am working on indexing emails and have stored the data as 
>>> milliseconds.  I was thinking of using a filter w/ my search that 
>>> would only return the email in that data range.  I am currently 
>>> indexing as follows:
>>>
>>> doc.add(new Field("date", (String) 
>>> itemContent.get("date").toString(), Field.Store.YES, 
>>> Field.Index.UN_TOKENIZED));
>>>
>>> does this look like a good approach to you all?
>>>
>>
>> Using milliseconds as your resolutions will make range searches very 
>> slow, since it has to enumerate so many values. I suggest using at 
>> most minutes instead.
>>
>> But either way I suggest using DateTools rather than using a Date 
>> object's toString() form, i.e.:
>>
>> doc.add(new Field("date", 
>> DateTools.dateToString(itemContent.get("date"), 
>> DateTools.Resolution.MILLISECOND), Field.Store.YES, 
>> Field.Index.UN_TOKENIZED));
>>
>>
>>
>>
>> Miles
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
> I guess the more I think about it I don't really care about the 
> minutes in the initial.  All that matters is the date (i.e. 
> 2006-07-25).  The only thing I would need the time for would be for 
> sorting so I need to have that too.  Ideas?
>
> Thanks!
> Michael
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

On this note....I want do a RangeQuery on the date (but I only care 
about YYYYMMDD).  What's the best way to index that?  I plan on storing 
the timestamp for sorting only.

Thanks.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Timestamps as milliseconds

Posted by "Michael J. Prichard" <mi...@mac.com>.
Miles Barr wrote:

> Michael J. Prichard wrote:
>
>> I am working on indexing emails and have stored the data as 
>> milliseconds.  I was thinking of using a filter w/ my search that 
>> would only return the email in that data range.  I am currently 
>> indexing as follows:
>>
>> doc.add(new Field("date", (String) 
>> itemContent.get("date").toString(), Field.Store.YES, 
>> Field.Index.UN_TOKENIZED));
>>
>> does this look like a good approach to you all?
>>
>
> Using milliseconds as your resolutions will make range searches very 
> slow, since it has to enumerate so many values. I suggest using at 
> most minutes instead.
>
> But either way I suggest using DateTools rather than using a Date 
> object's toString() form, i.e.:
>
> doc.add(new Field("date", 
> DateTools.dateToString(itemContent.get("date"), 
> DateTools.Resolution.MILLISECOND), Field.Store.YES, 
> Field.Index.UN_TOKENIZED));
>
>
>
>
> Miles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
I guess the more I think about it I don't really care about the minutes 
in the initial.  All that matters is the date (i.e. 2006-07-25).  The 
only thing I would need the time for would be for sorting so I need to 
have that too.  Ideas?

Thanks!
Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Timestamps as milliseconds

Posted by Miles Barr <mi...@magpie.net>.
Michael J. Prichard wrote:

> I am working on indexing emails and have stored the data as 
> milliseconds.  I was thinking of using a filter w/ my search that 
> would only return the email in that data range.  I am currently 
> indexing as follows:
>
> doc.add(new Field("date", (String) itemContent.get("date").toString(), 
> Field.Store.YES, Field.Index.UN_TOKENIZED));
>
> does this look like a good approach to you all?
>

Using milliseconds as your resolutions will make range searches very 
slow, since it has to enumerate so many values. I suggest using at most 
minutes instead.

But either way I suggest using DateTools rather than using a Date 
object's toString() form, i.e.:

doc.add(new Field("date", 
DateTools.dateToString(itemContent.get("date"), 
DateTools.Resolution.MILLISECOND), Field.Store.YES, 
Field.Index.UN_TOKENIZED));




Miles

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org