You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by christophe <ch...@lemoine-fr.com> on 2008/10/17 19:28:52 UTC
Sorting performance
Hi,
I'm doing some tests with Solr1.3
I have loaded around 7M documents, each with a few stored and indexed
fields.
This query: text:sometext returns the results, sorted by score in a few
milliseconds. (I display 10 out of 8747 matched documents)
This one: text:sometext;id desc takes something like 60s or more to
return the data (when it doesn't fails with an out of memory error). (id
is a string type).
I have tried to display only id, same results.
Any ideas ? I'm sure I'm doing something wrong.....
My schema is based on the sample, with the following fields:
<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="url" type="string" indexed="true" stored="true"/>
<field name="type" type="string" indexed="true" stored="true"/>
<field name="title" type="string" indexed="true" stored="true"/>
<field name="text" type="text" indexed="true" stored="true" />
<field name="tag" type="string" indexed="true" stored="true" multiValued="true" />
<field name="timestamp" type="date" indexed="true" stored="true" default="NOW" multiValued="false"/>
<dynamicField name="*" type="ignored" />
Thanks
Christophe
Re: Sorting performance + replication of index between cores
Posted by Sreeram Vaidyanathan <nv...@live.com>.
Did u guys find a solution?
I am having a similar issue.
Setup:
One indexer box & 2 searcher box. Each having 6 different solr-cores
We have a lot of updates (in the range of a couple thousand items every few
mins).
The Snappuller/Snapinstaller pulls and commits every 5 mins.
Query response time peaks to 60+ seconds when a new searcher is being
prepared.
I have disabled the caches (filter, query & document).
We have a strict requirement of response time < 10 secs all the time.
Thanks
Sreeram
sunnyfr wrote:
>
> Hi Christophe,
>
> Did you find a way to fix up your problem, cuz even with replication will
> have this problem, lot of update means clear cache and manage that.
> I've the same issue, I just wondering if I won't turn off servers during
> update ???
> How did you fix that ?
>
> Thanks,
> sunny
>
>
> christophe-2 wrote:
>>
>> Hi,
>>
>> After fully reloading my index, using another field than a Data does not
>> help that much.
>> Using a warmup query avoids having the first request slow, but:
>> - Frequents commits means that the Searcher is reloaded frequently
>> and, as the warmup takes time, the clients must wait.
>> - Having warmup slows down the index process (I guess this is
>> because after a commit, the Searchers are recreated)
>>
>> So I'm considering, as suggested, to have two instances: one for
>> indexing and one for searching.
>> I was wondering if there are simple ways to replicate the index in a
>> single Solr server running two cores ? Any such config already tested ?
>> I guess that the standard replication based on rsync can be simplified a
>> lot in this case as the two indexes are on the same server.
>>
>> Thanks
>> Christophe
>>
>> Beniamin Janicki wrote:
>>> :so you can send your updates anytime you want, and as long as you only
>>> :commit every 5 minutes (or commit on a master as often as you want, but
>>> :only run snappuller/snapinstaller on your slaves every 5 minutes) your
>>> :results will be at most 5minutes + warming time stale.
>>>
>>> This is what I do as well ( commits are done once per 5 minutes ). I've
>>> got
>>> master - slave configuration. Master has turned off all caches
>>> (commented in
>>> solrconfig.cml) and setup only 2 maxWarmingSearchers. Index size has 5GB
>>> ,Xmx= 1GB and committing takes around 10 secs ( on default configuration
>>> with warming it took from 30 mins up to 2 hours).
>>>
>>> Slave caches are configured to have autowarmCount="0" and
>>> maxWarmingSearchers=1 , and I have new data 1 second after snapshoot is
>>> done. I haven't noticed any huge delays while serving search request.
>>> Try to use those values - may be they'll help in your case too.
>>>
>>> Ben Janicki
>>>
>>>
>>> -----Original Message-----
>>> From: Chris Hostetter [mailto:hossman_lucene@fucit.org]
>>> Sent: 22 October 2008 04:56
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Sorting performance
>>>
>>>
>>> : The problem is that I will have hundreds of users doing queries, and a
>>> : continuous flow of document coming in.
>>> : So a delay in warming up a cache "could" be acceptable if I do it a
>>> few
>>> times
>>> : per day. But not on a too regular basis (right now, the first query
>>> that
>>> loads
>>> : the cache takes 150s).
>>> :
>>> : However: I'm not sure why it looks not to be a good idea to update the
>>> caches
>>>
>>> you can refresh the caches automaticly after updating, the "newSearcher"
>>> event is fired whenever a searcher is opened (but before it's used by
>>> clients) so you can configure warming queries for it -- it doesn't have
>>> to
>>> be done manually (or by the first user to use that reader)
>>>
>>> so you can send your updates anytime you want, and as long as you only
>>> commit every 5 minutes (or commit on a master as often as you want, but
>>> only run snappuller/snapinstaller on your slaves every 5 minutes) your
>>> results will be at most 5minutes + warming time stale.
>>>
>>>
>>> -Hoss
>>>
>>>
>>
>>
>
>
--
View this message in context: http://www.nabble.com/Sorting-performance-tp20037712p25286018.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sorting performance + replication of index between cores
Posted by sunnyfr <jo...@gmail.com>.
Hi Christophe,
Did you find a way to fix up your problem, cuz even with replication will
have this problem, lot of update means clear cache and manage that.
I've the same issue, I just wondering if I won't turn off servers during
update ???
How did you fix that ?
Thanks,
sunny
christophe-2 wrote:
>
> Hi,
>
> After fully reloading my index, using another field than a Data does not
> help that much.
> Using a warmup query avoids having the first request slow, but:
> - Frequents commits means that the Searcher is reloaded frequently
> and, as the warmup takes time, the clients must wait.
> - Having warmup slows down the index process (I guess this is
> because after a commit, the Searchers are recreated)
>
> So I'm considering, as suggested, to have two instances: one for
> indexing and one for searching.
> I was wondering if there are simple ways to replicate the index in a
> single Solr server running two cores ? Any such config already tested ?
> I guess that the standard replication based on rsync can be simplified a
> lot in this case as the two indexes are on the same server.
>
> Thanks
> Christophe
>
> Beniamin Janicki wrote:
>> :so you can send your updates anytime you want, and as long as you only
>> :commit every 5 minutes (or commit on a master as often as you want, but
>> :only run snappuller/snapinstaller on your slaves every 5 minutes) your
>> :results will be at most 5minutes + warming time stale.
>>
>> This is what I do as well ( commits are done once per 5 minutes ). I've
>> got
>> master - slave configuration. Master has turned off all caches (commented
>> in
>> solrconfig.cml) and setup only 2 maxWarmingSearchers. Index size has 5GB
>> ,Xmx= 1GB and committing takes around 10 secs ( on default configuration
>> with warming it took from 30 mins up to 2 hours).
>>
>> Slave caches are configured to have autowarmCount="0" and
>> maxWarmingSearchers=1 , and I have new data 1 second after snapshoot is
>> done. I haven't noticed any huge delays while serving search request.
>> Try to use those values - may be they'll help in your case too.
>>
>> Ben Janicki
>>
>>
>> -----Original Message-----
>> From: Chris Hostetter [mailto:hossman_lucene@fucit.org]
>> Sent: 22 October 2008 04:56
>> To: solr-user@lucene.apache.org
>> Subject: Re: Sorting performance
>>
>>
>> : The problem is that I will have hundreds of users doing queries, and a
>> : continuous flow of document coming in.
>> : So a delay in warming up a cache "could" be acceptable if I do it a few
>> times
>> : per day. But not on a too regular basis (right now, the first query
>> that
>> loads
>> : the cache takes 150s).
>> :
>> : However: I'm not sure why it looks not to be a good idea to update the
>> caches
>>
>> you can refresh the caches automaticly after updating, the "newSearcher"
>> event is fired whenever a searcher is opened (but before it's used by
>> clients) so you can configure warming queries for it -- it doesn't have
>> to
>> be done manually (or by the first user to use that reader)
>>
>> so you can send your updates anytime you want, and as long as you only
>> commit every 5 minutes (or commit on a master as often as you want, but
>> only run snappuller/snapinstaller on your slaves every 5 minutes) your
>> results will be at most 5minutes + warming time stale.
>>
>>
>> -Hoss
>>
>>
>
>
--
View this message in context: http://www.nabble.com/Sorting-performance-tp20037712p23094174.html
Sent from the Solr - User mailing list archive at Nabble.com.