You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by moshe <mo...@egis-software.com> on 2007/09/08 12:41:07 UTC

Performance Questions

I have a couple questions regarding performance of lucene. First off my
environment:

Data
1-10M Documents
5 - 30 fields < 10B
1-3 Fields 1KB - 500KB

I have three types of queries:

Query 1 : 85% usage 
1-2  phrase terms i.e. +id:"651" +id2:"241"
sorting by an arbitrary field normally the date
5-20 security terms
5k-1M results
can never return stale data

Query 2:  13%
10 full wildcard terms i.e. *search*
sorting is optional
0-200 results
20-200 security terms
can return slightly stale data

Query 3: 2%
1-20 mixed terms
sorting is optional
0-200 results
20-200 security terms
can return slightly stale data

1) Does re-opening an IndexSearcher flush all of the caches (filter and
sort) ? 

2) What is the overhead of opening an IndexSearcher? What does it depend on?

3) What is the recommended approach for updating and refreshing the index
where there is 1 update for every 5 queries? 

4) Is query 1 better off done using a database as I would have to re-open
the IndexSeacher every couple of queries?

5) What would perform better Solr or Lucence? When is it better to use one
or the other?

6) What else should I look out for?

7) Why is refreshing an IndexSearcher not supported? 


Any help is greatly appreciated 
Thanks
Moshe 


 

-- 
View this message in context: http://www.nabble.com/Performance-Questions-tf4405513.html#a12568500
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Performance Questions

Posted by testn <te...@doramail.com>.
- Searcher itself doesn't cost much. The cost came from the construction of
TermsInfoReader from IndexReader
- This means you can construct a number of searchers based on different
combination of indices.
- If I were you, I would construct a number of indices based on the demand
of freshness.
    - Reopen indices that stale is not an open very often... using
IndexReader.isCurrent() to help you
    - Reopen indices that stale is ok less often say every 10-20 minutes
- Then when you want to search, you just need to construct an IndexSearcher
using MultiReader/ParallelReader of those indices above
- Make sure you close the stale indices that you already opened an updated
indices
- If that's not enough and using HEAD code won't make you fainted, you can
try HEAD code that has LUCENE-743 implemented.
- For frequently updated data, it is probably better to use database
especially if you don't need scoring and keyword analyzing capability since
it's pretty costly to reopen IndexReader every time the data has been
updated.
- IndexSearcher doesn't support refreshing as it is based on IndexReader to
do the work. The caching of terms is done inside
IndexReader/TermsInfoReader. So if you want to update IndexSearcher, you
need to reopen it with more updated version of IndexReader.
- To get the best performance, you should really query just the data you
need. 

moshe wrote:
> 
> I have a couple questions regarding performance of lucene. First off my
> environment:
> 
> Data
> 1-10M Documents
> 5 - 30 fields < 10B
> 1-3 Fields 1KB - 500KB
> 
> I have three types of queries:
> 
> Query 1 : 85% usage 
> 1-2  phrase terms i.e. +id:"651" +id2:"241"
> sorting by an arbitrary field normally the date
> 5-20 security terms
> 5k-1M results
> can never return stale data
> 
> Query 2:  13%
> 10 full wildcard terms i.e. *search*
> sorting is optional
> 0-200 results
> 20-200 security terms
> can return slightly stale data
> 
> Query 3: 2%
> 1-20 mixed terms
> sorting is optional
> 0-200 results
> 20-200 security terms
> can return slightly stale data
> 
> 1) Does re-opening an IndexSearcher flush all of the caches (filter and
> sort) ? 
> 
> 2) What is the overhead of opening an IndexSearcher? What does it depend
> on?
> 
> 3) What is the recommended approach for updating and refreshing the index
> where there is 1 update for every 5 queries? 
> 
> 4) Is query 1 better off done using a database as I would have to re-open
> the IndexSeacher every couple of queries?
> 
> 5) What would perform better Solr or Lucence? When is it better to use one
> or the other?
> 
> 6) What else should I look out for?
> 
> 7) Why is refreshing an IndexSearcher not supported? 
> 
> 
> Any help is greatly appreciated 
> Thanks
> Moshe 
> 
> 
>  
> 
> 

-- 
View this message in context: http://www.nabble.com/Performance-Questions-tf4405513.html#a12606587
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org