You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by David Seltzer <ds...@TVEyes.com> on 2007/03/20 20:39:50 UTC
Sort Performance Question
Hi All,
I have a sort performance question:
I have a fairly large index consisting of chunks of full-text
transcriptions of television, radio and other media, and I'm trying to
make it searchable and sortable by date. The search front-end uses a
parallelmultisearcher to search up to three indexes at a time (each
index contains a month of live data). When I search for the word "toast"
(for example) sorted by score the results come back in about 1200ms,
when I sort it by DateTime the results come back in 3900ms.
Initially I was sorting based on a unixtime field, but having read up on
it, I switched to a slightly easier format: "yyyyMMDDHHmm". Now this
value is still larger than an int, so I went one step farther and
created two more fields for test purposes: SortDate, which is yyyyMMdd
and SortTime which is HHmm. When I sort by SortDate then SortTime the
results come in even slower, around 4300ms.
To summarize:
//The sorting fields looks like this:
new Field("SortDateTime", sdfDateTime.format(dMySortDateTime),
Field.Store.YES, Field.Index.UN_TOKENIZED);
new Field("SortDate", sdfDate.format(dMySortDateTime), Field.Store.YES,
Field.Index.UN_TOKENIZED);
new Field("SortTime", sdfTime.format(dMySortDateTime), Field.Store.YES,
Field.Index.UN_TOKENIZED);
//and the performance looks like this:
//sort by score
Sort sSortOrder = Sort.RELEVANCE; //1200ms
//sort by datetime
Sort sSortOrder = new Sort("SortDateTime", true); //3900ms
//sort by date then time
//yes, I know this isn't valid code
Sort sSortOrder = new Sort({new
SortField("SortDate",SortField.INT,bReverse), new
SortField("SortTime",SortField.INT,bReverse)}); //4300ms
The two indexes that are being searched at the moment look like this:
Index 1:
Index Path: /storage/unisearch/MMS_index/2007.02/
Index Size on Disk: 1,400,569 KB
Number of Records: 2682238
Index Version: 03/13/2007
Index 2:
Index Path: /storage/unisearch/MMS_index/2007.03/
Index Size on Disk: 2,055,199 KB
Number of Records: 3457434
Index Version: 03/13/2007
The search is being performed in tomcat and I'm running:
org.apache.lucene - build 2007-02-14 on a Dual 3.4GHz Xeon w/ 2GB memory
and Red Hat 3.4.3-22.
So, onto the question: Is this fast, slow, or normal.
Along, with the obvious follow up: if it's slow, how can I make it
faster.
Thanks for your help!
-Dave
Re: Sort Performance Question
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
Are you using a cached IndexSearcher such that successive sorts on
the same field will be more efficient?
Erik
On Mar 20, 2007, at 3:39 PM, David Seltzer wrote:
> Hi All,
>
>
>
> I have a sort performance question:
>
>
>
> I have a fairly large index consisting of chunks of full-text
> transcriptions of television, radio and other media, and I'm trying to
> make it searchable and sortable by date. The search front-end uses a
> parallelmultisearcher to search up to three indexes at a time (each
> index contains a month of live data). When I search for the word
> "toast"
> (for example) sorted by score the results come back in about 1200ms,
> when I sort it by DateTime the results come back in 3900ms.
>
>
>
> Initially I was sorting based on a unixtime field, but having read
> up on
> it, I switched to a slightly easier format: "yyyyMMDDHHmm". Now this
> value is still larger than an int, so I went one step farther and
> created two more fields for test purposes: SortDate, which is yyyyMMdd
> and SortTime which is HHmm. When I sort by SortDate then SortTime the
> results come in even slower, around 4300ms.
>
>
>
> To summarize:
>
>
>
> //The sorting fields looks like this:
>
> new Field("SortDateTime", sdfDateTime.format(dMySortDateTime),
> Field.Store.YES, Field.Index.UN_TOKENIZED);
>
> new Field("SortDate", sdfDate.format(dMySortDateTime),
> Field.Store.YES,
> Field.Index.UN_TOKENIZED);
>
> new Field("SortTime", sdfTime.format(dMySortDateTime),
> Field.Store.YES,
> Field.Index.UN_TOKENIZED);
>
>
>
> //and the performance looks like this:
>
>
>
> //sort by score
>
> Sort sSortOrder = Sort.RELEVANCE; //1200ms
>
>
>
> //sort by datetime
>
> Sort sSortOrder = new Sort("SortDateTime", true); //3900ms
>
>
>
> //sort by date then time
>
> //yes, I know this isn't valid code
>
> Sort sSortOrder = new Sort({new
> SortField("SortDate",SortField.INT,bReverse), new
> SortField("SortTime",SortField.INT,bReverse)}); //4300ms
>
>
>
>
>
> The two indexes that are being searched at the moment look like this:
>
>
>
> Index 1:
>
> Index Path: /storage/unisearch/MMS_index/2007.02/
>
> Index Size on Disk: 1,400,569 KB
>
> Number of Records: 2682238
>
> Index Version: 03/13/2007
>
>
>
> Index 2:
>
> Index Path: /storage/unisearch/MMS_index/2007.03/
>
> Index Size on Disk: 2,055,199 KB
>
> Number of Records: 3457434
>
> Index Version: 03/13/2007
>
>
>
> The search is being performed in tomcat and I'm running:
> org.apache.lucene - build 2007-02-14 on a Dual 3.4GHz Xeon w/ 2GB
> memory
> and Red Hat 3.4.3-22.
>
>
>
> So, onto the question: Is this fast, slow, or normal.
>
>
>
> Along, with the obvious follow up: if it's slow, how can I make it
> faster.
>
>
>
> Thanks for your help!
>
>
>
> -Dave
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Sort Performance Question
Posted by "Peter W." <pe...@marketingbrokers.com>.
Hello,
The response time for sorts depends on number of results.
If you don't need all documents returned you could use a filter.
One idea would be to use DateTools to save your dates as Strings
and build your query with FilteredQuery passing in a custom filter
to search this field.
The filter would be constructed using two RangeFilters setting upper
and lower date boundaries (Strings) combined using NumberTools and
ChainedFilter.
With a subset of your matching results sorting should be much faster.
Regards,
Peter W.
On Mar 20, 2007, at 12:39 PM, David Seltzer wrote:
> Hi All,
>
>
>
> I have a sort performance question:
>
>
>
> I have a fairly large index consisting of chunks of full-text
> transcriptions of television, radio and other media, and I'm trying to
> make it searchable and sortable by date. ...
>
> Initially I was sorting based on a unixtime field, but having read
> up on
> it, I switched to a slightly easier format: "yyyyMMDDHHmm". Now this
> value is still larger than an int, so I went one step farther and
> created two more fields for test purposes: SortDate, which is yyyyMMdd
> and SortTime which is HHmm. When I sort by SortDate then SortTime the
> results come in even slower, around 4300ms. ...
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org