You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Michael Celona <mc...@criticalmention.com> on 2005/03/10 14:45:43 UTC
search performace
I have a large index that needs to yield very fast query times. I am
sorting by date as default since I am interested in the most recent
documents. I was wondering if I boosted the score of my documents in
proportion to the date and not sorting would this increase search
performance. Thoughts?
Thanks,
Michael
RE: search performace
Posted by Michael Celona <mc...@criticalmention.com>.
My epoch looks like 1110816121 but is represented by a string.
-----Original Message-----
From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
Sent: Thursday, March 17, 2005 11:41 AM
To: java-user@lucene.apache.org
Subject: Re: search performace
On Mar 17, 2005, at 11:13 AM, Michael Celona wrote:
> Epoch is in seconds...
But you still haven't provided the *type* of epoch. It's a Date? a
String? What do the string values look like?
> I am also forced to used a date filter on most of
> searches... how bad is the performance hit of that.
Only testing will tell. The hit of a filter comes the first time (as
long as you cache and use the same IndexReader), so its not likely to
be a factor over many queries.
Erik
>
> -----Original Message-----
> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
> Sent: Thursday, March 17, 2005 9:54 AM
> To: java-user@lucene.apache.org
> Subject: Re: search performace
>
> Is epoch a Date? or a String? If a String, what format is it?
>
> Sorting by a Date keyword field will be sorting as a String value,
> which is a fair bit more resource intensive than if it was numeric.
>
> Try using a purely numeric field (though as a String) that can be
> represented as an int be sure to specify the sort type as an int and
> see if that improves performance. I'm pretty certain you'd still get
> better performance by using a boost than a sort though.
>
> Erik
>
> On Mar 17, 2005, at 8:59 AM, Michael Celona wrote:
>
>> I am sorting against an epoch time stored in my index. By using:
>>
>> contactDocument.add( Field.Keyword( "epoch_time", epoch );
>>
>> Then I sort by this field. My search time is in the order of 3sec on
>> an
>> index of about 6G using simple searches against a text field. By
>> using
>> boosts I was hoping to increase performance. Do you think this will
>> make a
>> big difference?
>>
>> Michael
>>
>> -----Original Message-----
>> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
>> Sent: Tuesday, March 15, 2005 8:43 AM
>> To: java-user@lucene.apache.org
>> Subject: Re: search performace
>>
>> I've been effectively off-line for a few days, so I'm not sure if
>> anyone has replied on this thread yet.
>>
>> Using boosts will definitely use less resources than sorting. If you
>> do use sorting for dates, be sure you're doing it numerically rather
>> than lexicographically.
>>
>> Erik
>>
>> On Mar 10, 2005, at 8:45 AM, Michael Celona wrote:
>>
>>> I have a large index that needs to yield very fast query times. I am
>>> sorting by date as default since I am interested in the most recent
>>> documents. I was wondering if I boosted the score of my documents in
>>> proportion to the date and not sorting would this increase search
>>> performance. Thoughts?
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Michael
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: search performace
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Mar 17, 2005, at 11:13 AM, Michael Celona wrote:
> Epoch is in seconds...
But you still haven't provided the *type* of epoch. It's a Date? a
String? What do the string values look like?
> I am also forced to used a date filter on most of
> searches... how bad is the performance hit of that.
Only testing will tell. The hit of a filter comes the first time (as
long as you cache and use the same IndexReader), so its not likely to
be a factor over many queries.
Erik
>
> -----Original Message-----
> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
> Sent: Thursday, March 17, 2005 9:54 AM
> To: java-user@lucene.apache.org
> Subject: Re: search performace
>
> Is epoch a Date? or a String? If a String, what format is it?
>
> Sorting by a Date keyword field will be sorting as a String value,
> which is a fair bit more resource intensive than if it was numeric.
>
> Try using a purely numeric field (though as a String) that can be
> represented as an int be sure to specify the sort type as an int and
> see if that improves performance. I'm pretty certain you'd still get
> better performance by using a boost than a sort though.
>
> Erik
>
> On Mar 17, 2005, at 8:59 AM, Michael Celona wrote:
>
>> I am sorting against an epoch time stored in my index. By using:
>>
>> contactDocument.add( Field.Keyword( "epoch_time", epoch );
>>
>> Then I sort by this field. My search time is in the order of 3sec on
>> an
>> index of about 6G using simple searches against a text field. By
>> using
>> boosts I was hoping to increase performance. Do you think this will
>> make a
>> big difference?
>>
>> Michael
>>
>> -----Original Message-----
>> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
>> Sent: Tuesday, March 15, 2005 8:43 AM
>> To: java-user@lucene.apache.org
>> Subject: Re: search performace
>>
>> I've been effectively off-line for a few days, so I'm not sure if
>> anyone has replied on this thread yet.
>>
>> Using boosts will definitely use less resources than sorting. If you
>> do use sorting for dates, be sure you're doing it numerically rather
>> than lexicographically.
>>
>> Erik
>>
>> On Mar 10, 2005, at 8:45 AM, Michael Celona wrote:
>>
>>> I have a large index that needs to yield very fast query times. I am
>>> sorting by date as default since I am interested in the most recent
>>> documents. I was wondering if I boosted the score of my documents in
>>> proportion to the date and not sorting would this increase search
>>> performance. Thoughts?
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Michael
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
RE: search performace
Posted by Michael Celona <mc...@criticalmention.com>.
Epoch is in seconds... I am also forced to used a date filter on most of
searches... how bad is the performance hit of that.
-----Original Message-----
From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
Sent: Thursday, March 17, 2005 9:54 AM
To: java-user@lucene.apache.org
Subject: Re: search performace
Is epoch a Date? or a String? If a String, what format is it?
Sorting by a Date keyword field will be sorting as a String value,
which is a fair bit more resource intensive than if it was numeric.
Try using a purely numeric field (though as a String) that can be
represented as an int be sure to specify the sort type as an int and
see if that improves performance. I'm pretty certain you'd still get
better performance by using a boost than a sort though.
Erik
On Mar 17, 2005, at 8:59 AM, Michael Celona wrote:
> I am sorting against an epoch time stored in my index. By using:
>
> contactDocument.add( Field.Keyword( "epoch_time", epoch );
>
> Then I sort by this field. My search time is in the order of 3sec on
> an
> index of about 6G using simple searches against a text field. By using
> boosts I was hoping to increase performance. Do you think this will
> make a
> big difference?
>
> Michael
>
> -----Original Message-----
> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
> Sent: Tuesday, March 15, 2005 8:43 AM
> To: java-user@lucene.apache.org
> Subject: Re: search performace
>
> I've been effectively off-line for a few days, so I'm not sure if
> anyone has replied on this thread yet.
>
> Using boosts will definitely use less resources than sorting. If you
> do use sorting for dates, be sure you're doing it numerically rather
> than lexicographically.
>
> Erik
>
> On Mar 10, 2005, at 8:45 AM, Michael Celona wrote:
>
>> I have a large index that needs to yield very fast query times. I am
>> sorting by date as default since I am interested in the most recent
>> documents. I was wondering if I boosted the score of my documents in
>> proportion to the date and not sorting would this increase search
>> performance. Thoughts?
>>
>>
>>
>> Thanks,
>>
>> Michael
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: search performace
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
Is epoch a Date? or a String? If a String, what format is it?
Sorting by a Date keyword field will be sorting as a String value,
which is a fair bit more resource intensive than if it was numeric.
Try using a purely numeric field (though as a String) that can be
represented as an int be sure to specify the sort type as an int and
see if that improves performance. I'm pretty certain you'd still get
better performance by using a boost than a sort though.
Erik
On Mar 17, 2005, at 8:59 AM, Michael Celona wrote:
> I am sorting against an epoch time stored in my index. By using:
>
> contactDocument.add( Field.Keyword( "epoch_time", epoch );
>
> Then I sort by this field. My search time is in the order of 3sec on
> an
> index of about 6G using simple searches against a text field. By using
> boosts I was hoping to increase performance. Do you think this will
> make a
> big difference?
>
> Michael
>
> -----Original Message-----
> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
> Sent: Tuesday, March 15, 2005 8:43 AM
> To: java-user@lucene.apache.org
> Subject: Re: search performace
>
> I've been effectively off-line for a few days, so I'm not sure if
> anyone has replied on this thread yet.
>
> Using boosts will definitely use less resources than sorting. If you
> do use sorting for dates, be sure you're doing it numerically rather
> than lexicographically.
>
> Erik
>
> On Mar 10, 2005, at 8:45 AM, Michael Celona wrote:
>
>> I have a large index that needs to yield very fast query times. I am
>> sorting by date as default since I am interested in the most recent
>> documents. I was wondering if I boosted the score of my documents in
>> proportion to the date and not sorting would this increase search
>> performance. Thoughts?
>>
>>
>>
>> Thanks,
>>
>> Michael
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
RE: search performace
Posted by Michael Celona <mc...@criticalmention.com>.
I am sorting against an epoch time stored in my index. By using:
contactDocument.add( Field.Keyword( "epoch_time", epoch );
Then I sort by this field. My search time is in the order of 3sec on an
index of about 6G using simple searches against a text field. By using
boosts I was hoping to increase performance. Do you think this will make a
big difference?
Michael
-----Original Message-----
From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
Sent: Tuesday, March 15, 2005 8:43 AM
To: java-user@lucene.apache.org
Subject: Re: search performace
I've been effectively off-line for a few days, so I'm not sure if
anyone has replied on this thread yet.
Using boosts will definitely use less resources than sorting. If you
do use sorting for dates, be sure you're doing it numerically rather
than lexicographically.
Erik
On Mar 10, 2005, at 8:45 AM, Michael Celona wrote:
> I have a large index that needs to yield very fast query times. I am
> sorting by date as default since I am interested in the most recent
> documents. I was wondering if I boosted the score of my documents in
> proportion to the date and not sorting would this increase search
> performance. Thoughts?
>
>
>
> Thanks,
>
> Michael
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: search performace
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
I've been effectively off-line for a few days, so I'm not sure if
anyone has replied on this thread yet.
Using boosts will definitely use less resources than sorting. If you
do use sorting for dates, be sure you're doing it numerically rather
than lexicographically.
Erik
On Mar 10, 2005, at 8:45 AM, Michael Celona wrote:
> I have a large index that needs to yield very fast query times. I am
> sorting by date as default since I am interested in the most recent
> documents. I was wondering if I boosted the score of my documents in
> proportion to the date and not sorting would this increase search
> performance. Thoughts?
>
>
>
> Thanks,
>
> Michael
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org