You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Michael J. Prichard" <mi...@mac.com> on 2006/07/26 15:47:51 UTC
Timestamps as milliseconds
I am working on indexing emails and have stored the data as
milliseconds. I was thinking of using a filter w/ my search that would
only return the email in that data range. I am currently indexing as
follows:
doc.add(new Field("date", (String) itemContent.get("date").toString(),
Field.Store.YES, Field.Index.UN_TOKENIZED));
does this look like a good approach to you all?
Thanks,
Michael
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Timestamps as milliseconds
Posted by Erick Erickson <er...@gmail.com>.
two ideas:
1> store a second field that contains the time resolution you need, and sort
by that. You can still search (quickly) by the day-resolution field.
2> If you KNOW that you are indexing the e-mails in time-order, then sorting
by doc_id will preserve the time ordering.
Erick
Re: Timestamps as milliseconds
Posted by Miles Barr <mi...@magpie.net>.
Michael J. Prichard wrote:
> I guess the more I think about it I don't really care about the
> minutes in the initial. All that matters is the date (i.e.
> 2006-07-25). The only thing I would need the time for would be for
> sorting so I need to have that too. Ideas?
>
Store as much detail as you need to sort by. For display purposes just
use java.text.SimpleDateFormat to only show the date and not the time.
Miles
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Timestamps as milliseconds
Posted by Miles Barr <mi...@magpie.net>.
Erick Erickson wrote:
> As Miles said, use the DateTools (lucene) class with a DAY resolution.
> That'll give you a YYYYMMDD format, which won't blow your query with a
> "TooManyClauses" exception.......
>
> Remember that Lucene deals with strings, so you want to store things in
> easily-manipulated string format, often one that's suitable for
> comparison.
> Which is what you want to do when you create a RangeQuery
I think this is the way to go, have one field at DAY resolution for the
RangeQuery and one field at MILLISECOND resolution for sorting.
The reason you want the coarsest resolution possible for RangeQuerys is
that it works the same way as any other query, it tries to match tokens.
To do this it enumerates all possible values between the two end points,
so if there's a week between the two dates and the resolution is DAY,
that's just seven values it tries to match against. If the resolution is
MILLISECOND it tries to match against 604,800,000 values.
Miles
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Timestamps as milliseconds
Posted by Erick Erickson <er...@gmail.com>.
As Miles said, use the DateTools (lucene) class with a DAY resolution.
That'll give you a YYYYMMDD format, which won't blow your query with a
"TooManyClauses" exception.......
Remember that Lucene deals with strings, so you want to store things in
easily-manipulated string format, often one that's suitable for comparison.
Which is what you want to do when you create a RangeQuery
Erick
Re: Timestamps as milliseconds
Posted by "Michael J. Prichard" <mi...@mac.com>.
Michael J. Prichard wrote:
> Miles Barr wrote:
>
>> Michael J. Prichard wrote:
>>
>>> I am working on indexing emails and have stored the data as
>>> milliseconds. I was thinking of using a filter w/ my search that
>>> would only return the email in that data range. I am currently
>>> indexing as follows:
>>>
>>> doc.add(new Field("date", (String)
>>> itemContent.get("date").toString(), Field.Store.YES,
>>> Field.Index.UN_TOKENIZED));
>>>
>>> does this look like a good approach to you all?
>>>
>>
>> Using milliseconds as your resolutions will make range searches very
>> slow, since it has to enumerate so many values. I suggest using at
>> most minutes instead.
>>
>> But either way I suggest using DateTools rather than using a Date
>> object's toString() form, i.e.:
>>
>> doc.add(new Field("date",
>> DateTools.dateToString(itemContent.get("date"),
>> DateTools.Resolution.MILLISECOND), Field.Store.YES,
>> Field.Index.UN_TOKENIZED));
>>
>>
>>
>>
>> Miles
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
> I guess the more I think about it I don't really care about the
> minutes in the initial. All that matters is the date (i.e.
> 2006-07-25). The only thing I would need the time for would be for
> sorting so I need to have that too. Ideas?
>
> Thanks!
> Michael
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
On this note....I want do a RangeQuery on the date (but I only care
about YYYYMMDD). What's the best way to index that? I plan on storing
the timestamp for sorting only.
Thanks.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Timestamps as milliseconds
Posted by "Michael J. Prichard" <mi...@mac.com>.
Miles Barr wrote:
> Michael J. Prichard wrote:
>
>> I am working on indexing emails and have stored the data as
>> milliseconds. I was thinking of using a filter w/ my search that
>> would only return the email in that data range. I am currently
>> indexing as follows:
>>
>> doc.add(new Field("date", (String)
>> itemContent.get("date").toString(), Field.Store.YES,
>> Field.Index.UN_TOKENIZED));
>>
>> does this look like a good approach to you all?
>>
>
> Using milliseconds as your resolutions will make range searches very
> slow, since it has to enumerate so many values. I suggest using at
> most minutes instead.
>
> But either way I suggest using DateTools rather than using a Date
> object's toString() form, i.e.:
>
> doc.add(new Field("date",
> DateTools.dateToString(itemContent.get("date"),
> DateTools.Resolution.MILLISECOND), Field.Store.YES,
> Field.Index.UN_TOKENIZED));
>
>
>
>
> Miles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
I guess the more I think about it I don't really care about the minutes
in the initial. All that matters is the date (i.e. 2006-07-25). The
only thing I would need the time for would be for sorting so I need to
have that too. Ideas?
Thanks!
Michael
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Timestamps as milliseconds
Posted by Miles Barr <mi...@magpie.net>.
Michael J. Prichard wrote:
> I am working on indexing emails and have stored the data as
> milliseconds. I was thinking of using a filter w/ my search that
> would only return the email in that data range. I am currently
> indexing as follows:
>
> doc.add(new Field("date", (String) itemContent.get("date").toString(),
> Field.Store.YES, Field.Index.UN_TOKENIZED));
>
> does this look like a good approach to you all?
>
Using milliseconds as your resolutions will make range searches very
slow, since it has to enumerate so many values. I suggest using at most
minutes instead.
But either way I suggest using DateTools rather than using a Date
object's toString() form, i.e.:
doc.add(new Field("date",
DateTools.dateToString(itemContent.get("date"),
DateTools.Resolution.MILLISECOND), Field.Store.YES,
Field.Index.UN_TOKENIZED));
Miles
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org