You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2008/10/17 23:26:52 UTC

Re: Sorting performance

Is the sorted query slow only the first time or every time you run it?

You got an OOM?  What -Xmx value are you using?  Try increasing it.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: christophe <ch...@lemoine-fr.com>
> To: solr-user@lucene.apache.org
> Sent: Friday, October 17, 2008 1:28:52 PM
> Subject: Sorting performance 
> 
> Hi,
> 
> I'm doing some tests with Solr1.3
> I have loaded around 7M documents, each with a few stored and indexed 
> fields.
> 
> This query: text:sometext returns the results, sorted by score in a few 
> milliseconds. (I display 10 out of 8747 matched documents)
> This one: text:sometext;id desc   takes something like 60s or more to 
> return the data (when it doesn't fails with an out of memory error). (id 
> is a string type).
> I have tried to display only id, same results.
> 
> Any ideas ? I'm sure I'm doing something wrong.....
> 
> My schema is based on the sample, with the following fields:
> 
>   
> /> 
>   
>   
>   
>   
>   
> multiValued="true" />
>   
> default="NOW" multiValued="false"/>
>   
> 
> 
> Thanks
> Christophe

Re: query parsing issue + behavior as OR (solr 1.4-dev)

Posted by Norberto Meijome <nu...@gmail.com>.

On Mon, 20 Oct 2008 06:21:06 -0700 (PDT)
Sunil Sarje <su...@yahoo.com> wrote:

> I am working with nightly build of Oct 17, 2008  and found the issue that
> something wrong with LuceneQParserPlugin; It takes + as OR

Sunil, please do not hijack the thread :

http://en.wikipedia.org/wiki/Thread_hijacking

thanks,
B

_________________________
{Beto|Norberto|Numard} Meijome

He could be a poster child for retroactive birth control.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.

Re: Sorting performance

Posted by Erick Erickson <er...@gmail.com>.

Caches are specific to opening a searcher. So whenever you open a reader,
the caches are rebuilt for that server. If you are picking up your changes,
you
MUST be opening a new reader so yes, indeed, your caches are being flushed.

You can get around this by firing a few warmup queries at the server before
using it "for real".

If you are opening a new reader for each request, well, you shouldn't do
that <G>.

Best
Erick

On Mon, Oct 20, 2008 at 9:02 AM, christophe <ch...@lemoine-fr.com>wrote:

> When I start indexing new documents, searches are taking long time again:
> is the sort cache flushed when new documents are indexed ?
>
> Thanks
> Christophe
>
> Mark Miller wrote:
>
>> You need to setup a warming query that sorts so that the initial long
>> query is done behind the scenes. Users first query will then be fast.
>> Solrconfig.
>>
>> - Mark
>>
>>
>> On Oct 18, 2008, at 1:34 AM, christophe <ch...@lemoine-fr.com>
>> wrote:
>>
>>  Here are the memory parameters I'm using now(Tomcat): -Xms2024m -Xmx2024m
>>> With those values, the second query is way faster. Only the first one is
>>> very slow.
>>> Thanks for the tip.
>>> However, I'm wondering if will be enough and I will not hit the same
>>> issues when I will have many users searching at the same time: I will do a
>>> stress test to check this.
>>>
>>> Thanks
>>> Christophe
>>>
>>> christophe wrote:
>>>
>>>> It is slow each time I run it. (I test it from the Solr admin console or
>>>> from a JAVA program using the Http client).
>>>> I do not get the OOM each time.
>>>>
>>>> Thx
>>>> Christophe
>>>>
>>>> Otis Gospodnetic wrote:
>>>>
>>>>> Is the sorted query slow only the first time or every time you run it?
>>>>>
>>>>> You got an OOM?  What -Xmx value are you using?  Try increasing it.
>>>>>
>>>>> Otis
>>>>> --
>>>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>>>>
>>>>>
>>>>>
>>>>> ----- Original Message ----
>>>>>
>>>>>  From: christophe <ch...@lemoine-fr.com>
>>>>>> To: solr-user@lucene.apache.org
>>>>>> Sent: Friday, October 17, 2008 1:28:52 PM
>>>>>> Subject: Sorting performance
>>>>>> Hi,
>>>>>>
>>>>>> I'm doing some tests with Solr1.3
>>>>>> I have loaded around 7M documents, each with a few stored and indexed
>>>>>> fields.
>>>>>>
>>>>>> This query: text:sometext returns the results, sorted by score in a
>>>>>> few milliseconds. (I display 10 out of 8747 matched documents)
>>>>>> This one: text:sometext;id desc   takes something like 60s or more to
>>>>>> return the data (when it doesn't fails with an out of memory error). (id is
>>>>>> a string type).
>>>>>> I have tried to display only id, same results.
>>>>>>
>>>>>> Any ideas ? I'm sure I'm doing something wrong.....
>>>>>>
>>>>>> My schema is based on the sample, with the following fields:
>>>>>>
>>>>>>  />           multiValued="true" />
>>>>>>  default="NOW" multiValued="false"/>
>>>>>>
>>>>>> Thanks
>>>>>> Christophe
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>

sint and sfloat FieldCache "type" -- was: RE: Sorting performance

Posted by Chris Hostetter <ho...@fucit.org>.

Replying to a post on solr-user ...

: From: Lance Norskog
: Subject: RE: Sorting performance

: Since 'sint' is needed to do range queries on a field, and 'int' is needed
: for efficient sorting, we wound up have one field of each type and a
: <copyField> to make sure they both get the same numbers.  Yes, it's
: annoying. 

it never relaly occured to be before, until this post, when i went to 
double check the code in SortableIntField and SortableFloatField, but why 
do they use SortField.STRING instead of a SortComparator that passes a 
custom IntParser or FloatParser to FieldCache.getInts() and 
FieldCache.getFloats()?

I actually that it was working this way until i noticed this comment ... 
is the assumption that it's better to just use the StringIndex because 
it's faster to build even though it takes more memory?

At the very least it seems like it could be a worthwhile option to expose 
for these fieldtypes ... use less ram to sort but having warming queries 
take longer.

(Wouldn't switching to the int and float FieldCaches also make ValueSource 
queries faster since the parsing only would be done once?)



-Hoss

RE: Sorting performance

Posted by Lance Norskog <go...@gmail.com>.

Accd to previous posters on this topic, sorting requires an array with an
entry per document in the entire index. Each entry has 32 bits for the 'int'
type, and 32 bits plus the field representation length for other types. Not
knowing Lucene internals I have a hard time believing that it really has to
be this wasteful, but oh well.

Since 'sint' is needed to do range queries on a field, and 'int' is needed
for efficient sorting, we wound up have one field of each type and a
<copyField> to make sure they both get the same numbers.  Yes, it's
annoying. 

-----Original Message-----
From: Mark Miller [mailto:markrmiller@gmail.com] 
Sent: Monday, October 20, 2008 6:24 AM
To: solr-user@lucene.apache.org
Subject: Re: Sorting performance

christophe wrote:
> When I start indexing new documents, searches are taking long time
> again: is the sort cache flushed when new documents are indexed ?

When you commit, a new Reader will be opened (or reopened) so that the
freshly added docs can be seen. This would make the first search slow again,
but if you have the warming queries, it should be warmed before being put
into use. Be sure the warming query sorts on the right field.

>
> Are there any metrics on how to compute memory requirements (based on 
> doc average size, number of sorted fields, number of indexed documents
> + number of new document / day) ?

Depends on the field type, but I think its 32bits x numDocs for most 
datatypes, with the String datatype also requiring an array of all the 
unique terms to index into. Thats not everything, but it dominates.


> Thanks
> Christophe
> Mark Miller wrote:
>> You need to setup a warming query that sorts so that the initial long 
>> query is done behind the scenes. Users first query will then be fast. 
>> Solrconfig.
>>
>> - Mark
>>
>>
>> On Oct 18, 2008, at 1:34 AM, christophe <ch...@lemoine-fr.com> 
>> wrote:
>>
>>> Here are the memory parameters I'm using now(Tomcat): -Xms2024m 
>>> -Xmx2024m
>>> With those values, the second query is way faster. Only the first 
>>> one is very slow.
>>> Thanks for the tip.
>>> However, I'm wondering if will be enough and I will not hit the same 
>>> issues when I will have many users searching at the same time: I 
>>> will do a stress test to check this.
>>>
>>> Thanks
>>> Christophe
>>>
>>> christophe wrote:
>>>> It is slow each time I run it. (I test it from the Solr admin 
>>>> console or from a JAVA program using the Http client).
>>>> I do not get the OOM each time.
>>>>
>>>> Thx
>>>> Christophe
>>>>
>>>> Otis Gospodnetic wrote:
>>>>> Is the sorted query slow only the first time or every time you run 
>>>>> it?
>>>>>
>>>>> You got an OOM?  What -Xmx value are you using?  Try increasing it.
>>>>>
>>>>> Otis
>>>>> -- 
>>>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>>>>
>>>>>
>>>>>
>>>>> ----- Original Message ----
>>>>>
>>>>>> From: christophe <ch...@lemoine-fr.com>
>>>>>> To: solr-user@lucene.apache.org
>>>>>> Sent: Friday, October 17, 2008 1:28:52 PM
>>>>>> Subject: Sorting performance
>>>>>> Hi,
>>>>>>
>>>>>> I'm doing some tests with Solr1.3
>>>>>> I have loaded around 7M documents, each with a few stored and 
>>>>>> indexed fields.
>>>>>>
>>>>>> This query: text:sometext returns the results, sorted by score in 
>>>>>> a few milliseconds. (I display 10 out of 8747 matched documents)
>>>>>> This one: text:sometext;id desc   takes something like 60s or 
>>>>>> more to return the data (when it doesn't fails with an out of 
>>>>>> memory error). (id is a string type).
>>>>>> I have tried to display only id, same results.
>>>>>>
>>>>>> Any ideas ? I'm sure I'm doing something wrong.....
>>>>>>
>>>>>> My schema is based on the sample, with the following fields:
>>>>>>
>>>>>>  />           multiValued="true" />
>>>>>>  default="NOW" multiValued="false"/>
>>>>>>
>>>>>> Thanks
>>>>>> Christophe
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>

Re: Sorting performance

Posted by christophe <ch...@lemoine-fr.com>.

I'm now considering if Solr (Lucene) is a good choice when we have a 
huge number of indexed document and a large number of new documents 
needs to be indexed everyday.

Maybe I'm wrong, but my feeling is that the way the sort caches are 
handled (recreated after new commit, not shared between Searcher), the 
solution does not scale. And it is not just a memory issue (memory is 
cheap), but more the lack of update of an existing cache.

I'm testing if I can sort on a field that might be faster to cache: any 
hints on this ? Would that make a difference if  I use a field with less 
different values than a timestamp ? I'm looking for some details on how 
the cache is populated on the first query. Also, for the code insiders 
;-), would that be difficult to change this caching mechanism to allow 
update and reuse of an existing cache ?

Thanks for your help
Christophe

christophe wrote:
> The problem is that I will have hundreds of users doing queries, and a 
> continuous flow of document coming in.
> So a delay in warming up a cache "could" be acceptable if I do it a 
> few times per day. But not on a too regular basis (right now, the 
> first query that loads the cache takes 150s).
>
> However: I'm not sure why it looks not to be a good idea to update the 
> caches when updates are committed ? Any centralized cache (memcached 
> is a good one) that is maintained up to date by the update/commit 
> process would be great. Config options could then let to the user to 
> decide if the cache is shared between servers or not. Creating a new 
> cache and then swap it will double the necessary memory.
>
> I also have a related questions regarding readers: a new reader is 
> opened when documents are committed. And the cache is associated with 
> the reader (if I got it right). Are all user requests served by this 
> reader ? How does that scale if I have many concurrent users ?
>
> C.
>
> Norberto Meijome wrote:
>> On Mon, 20 Oct 2008 16:28:23 +0300
>> christophe <ch...@lemoine-fr.com> wrote:
>>
>>  
>>> Hum..... this mean I have to wait before I index new documents and 
>>> avoid indexing when they are created (I have about 50 000 new 
>>> documents created each day and I was planning to make those 
>>> searchable ASAP).
>>>     
>>
>> you can always index + optimize out of band in a 'master' / RW server 
>> , and
>> then send the updated index to your slave (the one actually serving the
>> requests).
>> This *will NOT* remove the need to refresh your cache, but it will 
>> remove any
>> delay introduced by commit/indexing + optimise.
>>
>>  
>>> Too bad there is no way to have a centralized cache that can be 
>>> shared AND updated when new documents are created.
>>>     
>>
>> hmm not sure it makes sense like that... but maybe along the lines of 
>> having an
>> active cache that is used to serve queries, and new ones being 
>> prepared, and
>> then swapped when ready.
>> Speaking of which (or not :P) , has anyone thought about / done any 
>> work on
>> using memcached for these internal solr caches? I guess it would make 
>> sense for
>> setups with several slaves ( or even a master updating memcached
>> too...)...though for a setup with shards it would be slightly more 
>> involved
>> (although it *could* be used to support several slaves per 'data 
>> shard' ).
>>
>> All the best,
>> B
>> _________________________
>> {Beto|Norberto|Numard} Meijome
>>
>> RTFM and STFW before anything bad happens.
>>
>> I speak for myself, not my employer. Contents may be hot. Slippery 
>> when wet.
>> Reading disclaimers makes you go blind. Writing them is worse. You 
>> have been
>> Warned.
>>   
>

Re: Sorting performance + replication of index between cores

Posted by Sreeram Vaidyanathan <nv...@live.com>.

Did u guys find a solution?
I am having a similar issue.

Setup:
One indexer box & 2 searcher box. Each having 6 different solr-cores
We have a lot of updates (in the range of a couple thousand items every few
mins).
The Snappuller/Snapinstaller pulls and commits every 5 mins.

Query response time peaks to 60+ seconds when a new searcher is being
prepared.
I have disabled the caches (filter, query & document). 

We have a strict requirement of response time < 10 secs all the time.

Thanks
Sreeram


sunnyfr wrote:
> 
> Hi Christophe, 
> 
> Did you find a way to fix up your problem, cuz even with replication will
> have this problem, lot of update means clear cache and manage that.
> I've the same issue, I just wondering if I won't turn off servers during
> update ??? 
> How did you fix that ? 
> 
> Thanks,
> sunny
> 
> 
> christophe-2 wrote:
>> 
>> Hi,
>> 
>> After fully reloading my index, using another field than a Data does not 
>> help that much.
>> Using a warmup query avoids having the first request slow, but:
>>      - Frequents commits means that the Searcher is reloaded frequently 
>> and, as the warmup takes time, the clients must wait.
>>      - Having warmup slows down the index process (I guess this is 
>> because after a commit, the Searchers are recreated)
>> 
>> So I'm considering, as suggested,  to have two instances: one for 
>> indexing and one for searching.
>> I was wondering if there are simple ways to replicate the index in a 
>> single Solr server running two cores ? Any such config already tested ? 
>> I guess that the standard replication based on rsync can be simplified a 
>> lot in this case as the two indexes are on the same server.
>> 
>> Thanks
>> Christophe
>> 
>> Beniamin Janicki wrote:
>>> :so you can send your updates anytime you want, and as long as you only 
>>> :commit every 5 minutes (or commit on a master as often as you want, but 
>>> :only run snappuller/snapinstaller on your slaves every 5 minutes) your 
>>> :results will be at most 5minutes + warming time stale.
>>>
>>> This is what I do as well ( commits are done once per 5 minutes ). I've
>>> got
>>> master - slave configuration. Master has turned off all caches
>>> (commented in
>>> solrconfig.cml) and setup only 2 maxWarmingSearchers. Index size has 5GB
>>> ,Xmx= 1GB and committing takes around 10 secs ( on default configuration
>>> with warming it took from 30 mins up to 2 hours). 
>>>
>>> Slave caches are configured to have autowarmCount="0" and
>>> maxWarmingSearchers=1 , and I have new data 1 second after snapshoot is
>>> done. I haven't noticed any huge delays while serving search request.
>>> Try to use those values - may be they'll help in your case too.
>>>
>>> Ben Janicki
>>>
>>>
>>> -----Original Message-----
>>> From: Chris Hostetter [mailto:hossman_lucene@fucit.org] 
>>> Sent: 22 October 2008 04:56
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Sorting performance
>>>
>>>
>>> : The problem is that I will have hundreds of users doing queries, and a
>>> : continuous flow of document coming in.
>>> : So a delay in warming up a cache "could" be acceptable if I do it a
>>> few
>>> times
>>> : per day. But not on a too regular basis (right now, the first query
>>> that
>>> loads
>>> : the cache takes 150s).
>>> : 
>>> : However: I'm not sure why it looks not to be a good idea to update the
>>> caches
>>>
>>> you can refresh the caches automaticly after updating, the "newSearcher" 
>>> event is fired whenever a searcher is opened (but before it's used by 
>>> clients) so you can configure warming queries for it -- it doesn't have
>>> to 
>>> be done manually (or by the first user to use that reader)
>>>
>>> so you can send your updates anytime you want, and as long as you only 
>>> commit every 5 minutes (or commit on a master as often as you want, but 
>>> only run snappuller/snapinstaller on your slaves every 5 minutes) your 
>>> results will be at most 5minutes + warming time stale.
>>>
>>>
>>> -Hoss
>>>
>>>   
>> 
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Sorting-performance-tp20037712p25286018.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Sorting performance + replication of index between cores

Posted by sunnyfr <jo...@gmail.com>.

Hi Christophe, 

Did you find a way to fix up your problem, cuz even with replication will
have this problem, lot of update means clear cache and manage that.
I've the same issue, I just wondering if I won't turn off servers during
update ??? 
How did you fix that ? 

Thanks,
sunny


christophe-2 wrote:
> 
> Hi,
> 
> After fully reloading my index, using another field than a Data does not 
> help that much.
> Using a warmup query avoids having the first request slow, but:
>      - Frequents commits means that the Searcher is reloaded frequently 
> and, as the warmup takes time, the clients must wait.
>      - Having warmup slows down the index process (I guess this is 
> because after a commit, the Searchers are recreated)
> 
> So I'm considering, as suggested,  to have two instances: one for 
> indexing and one for searching.
> I was wondering if there are simple ways to replicate the index in a 
> single Solr server running two cores ? Any such config already tested ? 
> I guess that the standard replication based on rsync can be simplified a 
> lot in this case as the two indexes are on the same server.
> 
> Thanks
> Christophe
> 
> Beniamin Janicki wrote:
>> :so you can send your updates anytime you want, and as long as you only 
>> :commit every 5 minutes (or commit on a master as often as you want, but 
>> :only run snappuller/snapinstaller on your slaves every 5 minutes) your 
>> :results will be at most 5minutes + warming time stale.
>>
>> This is what I do as well ( commits are done once per 5 minutes ). I've
>> got
>> master - slave configuration. Master has turned off all caches (commented
>> in
>> solrconfig.cml) and setup only 2 maxWarmingSearchers. Index size has 5GB
>> ,Xmx= 1GB and committing takes around 10 secs ( on default configuration
>> with warming it took from 30 mins up to 2 hours). 
>>
>> Slave caches are configured to have autowarmCount="0" and
>> maxWarmingSearchers=1 , and I have new data 1 second after snapshoot is
>> done. I haven't noticed any huge delays while serving search request.
>> Try to use those values - may be they'll help in your case too.
>>
>> Ben Janicki
>>
>>
>> -----Original Message-----
>> From: Chris Hostetter [mailto:hossman_lucene@fucit.org] 
>> Sent: 22 October 2008 04:56
>> To: solr-user@lucene.apache.org
>> Subject: Re: Sorting performance
>>
>>
>> : The problem is that I will have hundreds of users doing queries, and a
>> : continuous flow of document coming in.
>> : So a delay in warming up a cache "could" be acceptable if I do it a few
>> times
>> : per day. But not on a too regular basis (right now, the first query
>> that
>> loads
>> : the cache takes 150s).
>> : 
>> : However: I'm not sure why it looks not to be a good idea to update the
>> caches
>>
>> you can refresh the caches automaticly after updating, the "newSearcher" 
>> event is fired whenever a searcher is opened (but before it's used by 
>> clients) so you can configure warming queries for it -- it doesn't have
>> to 
>> be done manually (or by the first user to use that reader)
>>
>> so you can send your updates anytime you want, and as long as you only 
>> commit every 5 minutes (or commit on a master as often as you want, but 
>> only run snappuller/snapinstaller on your slaves every 5 minutes) your 
>> results will be at most 5minutes + warming time stale.
>>
>>
>> -Hoss
>>
>>   
> 
> 

-- 
View this message in context: http://www.nabble.com/Sorting-performance-tp20037712p23094174.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Sorting performance + replication of index between cores

Posted by christophe <ch...@lemoine-fr.com>.

Hi,

After fully reloading my index, using another field than a Data does not 
help that much.
Using a warmup query avoids having the first request slow, but:
     - Frequents commits means that the Searcher is reloaded frequently 
and, as the warmup takes time, the clients must wait.
     - Having warmup slows down the index process (I guess this is 
because after a commit, the Searchers are recreated)

So I'm considering, as suggested,  to have two instances: one for 
indexing and one for searching.
I was wondering if there are simple ways to replicate the index in a 
single Solr server running two cores ? Any such config already tested ? 
I guess that the standard replication based on rsync can be simplified a 
lot in this case as the two indexes are on the same server.

Thanks
Christophe

Beniamin Janicki wrote:
> :so you can send your updates anytime you want, and as long as you only 
> :commit every 5 minutes (or commit on a master as often as you want, but 
> :only run snappuller/snapinstaller on your slaves every 5 minutes) your 
> :results will be at most 5minutes + warming time stale.
>
> This is what I do as well ( commits are done once per 5 minutes ). I've got
> master - slave configuration. Master has turned off all caches (commented in
> solrconfig.cml) and setup only 2 maxWarmingSearchers. Index size has 5GB
> ,Xmx= 1GB and committing takes around 10 secs ( on default configuration
> with warming it took from 30 mins up to 2 hours). 
>
> Slave caches are configured to have autowarmCount="0" and
> maxWarmingSearchers=1 , and I have new data 1 second after snapshoot is
> done. I haven't noticed any huge delays while serving search request.
> Try to use those values - may be they'll help in your case too.
>
> Ben Janicki
>
>
> -----Original Message-----
> From: Chris Hostetter [mailto:hossman_lucene@fucit.org] 
> Sent: 22 October 2008 04:56
> To: solr-user@lucene.apache.org
> Subject: Re: Sorting performance
>
>
> : The problem is that I will have hundreds of users doing queries, and a
> : continuous flow of document coming in.
> : So a delay in warming up a cache "could" be acceptable if I do it a few
> times
> : per day. But not on a too regular basis (right now, the first query that
> loads
> : the cache takes 150s).
> : 
> : However: I'm not sure why it looks not to be a good idea to update the
> caches
>
> you can refresh the caches automaticly after updating, the "newSearcher" 
> event is fired whenever a searcher is opened (but before it's used by 
> clients) so you can configure warming queries for it -- it doesn't have to 
> be done manually (or by the first user to use that reader)
>
> so you can send your updates anytime you want, and as long as you only 
> commit every 5 minutes (or commit on a master as often as you want, but 
> only run snappuller/snapinstaller on your slaves every 5 minutes) your 
> results will be at most 5minutes + warming time stale.
>
>
> -Hoss
>
>

Re: Sorting performance

Posted by christophe <ch...@lemoine-fr.com>.

Hi,

I'm now reloading my index.
The issue might be related with the way dates are handled (I was sorting 
on a date field).
Now, I have added an integer field that represent the date (but in 
minutes instead of milli seconds).
With 4M documents (and indexing running in background), I have a correct 
response time, even for the first query. I still want to check with 10M 
and more documents.

Once my index is fully loaded, I will try the config parameters you suggest.

Thanks
Christophe

Beniamin Janicki wrote:
> :so you can send your updates anytime you want, and as long as you only 
> :commit every 5 minutes (or commit on a master as often as you want, but 
> :only run snappuller/snapinstaller on your slaves every 5 minutes) your 
> :results will be at most 5minutes + warming time stale.
>
> This is what I do as well ( commits are done once per 5 minutes ). I've got
> master - slave configuration. Master has turned off all caches (commented in
> solrconfig.cml) and setup only 2 maxWarmingSearchers. Index size has 5GB
> ,Xmx= 1GB and committing takes around 10 secs ( on default configuration
> with warming it took from 30 mins up to 2 hours). 
>
> Slave caches are configured to have autowarmCount="0" and
> maxWarmingSearchers=1 , and I have new data 1 second after snapshoot is
> done. I haven't noticed any huge delays while serving search request.
> Try to use those values - may be they'll help in your case too.
>
> Ben Janicki
>
>
> -----Original Message-----
> From: Chris Hostetter [mailto:hossman_lucene@fucit.org] 
> Sent: 22 October 2008 04:56
> To: solr-user@lucene.apache.org
> Subject: Re: Sorting performance
>
>
> : The problem is that I will have hundreds of users doing queries, and a
> : continuous flow of document coming in.
> : So a delay in warming up a cache "could" be acceptable if I do it a few
> times
> : per day. But not on a too regular basis (right now, the first query that
> loads
> : the cache takes 150s).
> : 
> : However: I'm not sure why it looks not to be a good idea to update the
> caches
>
> you can refresh the caches automaticly after updating, the "newSearcher" 
> event is fired whenever a searcher is opened (but before it's used by 
> clients) so you can configure warming queries for it -- it doesn't have to 
> be done manually (or by the first user to use that reader)
>
> so you can send your updates anytime you want, and as long as you only 
> commit every 5 minutes (or commit on a master as often as you want, but 
> only run snappuller/snapinstaller on your slaves every 5 minutes) your 
> results will be at most 5minutes + warming time stale.
>
>
> -Hoss
>
>

RE: Sorting performance

Posted by Beniamin Janicki <be...@mippin.com>.

:so you can send your updates anytime you want, and as long as you only 
:commit every 5 minutes (or commit on a master as often as you want, but 
:only run snappuller/snapinstaller on your slaves every 5 minutes) your 
:results will be at most 5minutes + warming time stale.

This is what I do as well ( commits are done once per 5 minutes ). I've got
master - slave configuration. Master has turned off all caches (commented in
solrconfig.cml) and setup only 2 maxWarmingSearchers. Index size has 5GB
,Xmx= 1GB and committing takes around 10 secs ( on default configuration
with warming it took from 30 mins up to 2 hours). 

Slave caches are configured to have autowarmCount="0" and
maxWarmingSearchers=1 , and I have new data 1 second after snapshoot is
done. I haven't noticed any huge delays while serving search request.
Try to use those values - may be they'll help in your case too.

Ben Janicki


-----Original Message-----
From: Chris Hostetter [mailto:hossman_lucene@fucit.org] 
Sent: 22 October 2008 04:56
To: solr-user@lucene.apache.org
Subject: Re: Sorting performance


: The problem is that I will have hundreds of users doing queries, and a
: continuous flow of document coming in.
: So a delay in warming up a cache "could" be acceptable if I do it a few
times
: per day. But not on a too regular basis (right now, the first query that
loads
: the cache takes 150s).
: 
: However: I'm not sure why it looks not to be a good idea to update the
caches

you can refresh the caches automaticly after updating, the "newSearcher" 
event is fired whenever a searcher is opened (but before it's used by 
clients) so you can configure warming queries for it -- it doesn't have to 
be done manually (or by the first user to use that reader)

so you can send your updates anytime you want, and as long as you only 
commit every 5 minutes (or commit on a master as often as you want, but 
only run snappuller/snapinstaller on your slaves every 5 minutes) your 
results will be at most 5minutes + warming time stale.


-Hoss

Re: Sorting performance

Posted by Chris Hostetter <ho...@fucit.org>.

: The problem is that I will have hundreds of users doing queries, and a
: continuous flow of document coming in.
: So a delay in warming up a cache "could" be acceptable if I do it a few times
: per day. But not on a too regular basis (right now, the first query that loads
: the cache takes 150s).
: 
: However: I'm not sure why it looks not to be a good idea to update the caches

you can refresh the caches automaticly after updating, the "newSearcher" 
event is fired whenever a searcher is opened (but before it's used by 
clients) so you can configure warming queries for it -- it doesn't have to 
be done manually (or by the first user to use that reader)

so you can send your updates anytime you want, and as long as you only 
commit every 5 minutes (or commit on a master as often as you want, but 
only run snappuller/snapinstaller on your slaves every 5 minutes) your 
results will be at most 5minutes + warming time stale.


-Hoss

Re: Sorting performance

Posted by christophe <ch...@lemoine-fr.com>.

The problem is that I will have hundreds of users doing queries, and a 
continuous flow of document coming in.
So a delay in warming up a cache "could" be acceptable if I do it a few 
times per day. But not on a too regular basis (right now, the first 
query that loads the cache takes 150s).

However: I'm not sure why it looks not to be a good idea to update the 
caches when updates are committed ? Any centralized cache (memcached is 
a good one) that is maintained up to date by the update/commit process 
would be great. Config options could then let to the user to decide if 
the cache is shared between servers or not. Creating a new cache and 
then swap it will double the necessary memory.

I also have a related questions regarding readers: a new reader is 
opened when documents are committed. And the cache is associated with 
the reader (if I got it right). Are all user requests served by this 
reader ? How does that scale if I have many concurrent users ?

C.

Norberto Meijome wrote:
> On Mon, 20 Oct 2008 16:28:23 +0300
> christophe <ch...@lemoine-fr.com> wrote:
>
>   
>> Hum..... this mean I have to wait before I index new documents and avoid 
>> indexing when they are created (I have about 50 000 new documents 
>> created each day and I was planning to make those searchable ASAP).
>>     
>
> you can always index + optimize out of band in a 'master' / RW server , and
> then send the updated index to your slave (the one actually serving the
> requests). 
>
> This *will NOT* remove the need to refresh your cache, but it will remove any
> delay introduced by commit/indexing + optimise.
>
>   
>> Too bad there is no way to have a centralized cache that can be shared 
>> AND updated when new documents are created.
>>     
>
> hmm not sure it makes sense like that... but maybe along the lines of having an
> active cache that is used to serve queries, and new ones being prepared, and
> then swapped when ready. 
>
> Speaking of which (or not :P) , has anyone thought about / done any work on
> using memcached for these internal solr caches? I guess it would make sense for
> setups with several slaves ( or even a master updating memcached
> too...)...though for a setup with shards it would be slightly more involved
> (although it *could* be used to support several slaves per 'data shard' ).
>
> All the best,
> B
> _________________________
> {Beto|Norberto|Numard} Meijome
>
> RTFM and STFW before anything bad happens.
>
> I speak for myself, not my employer. Contents may be hot. Slippery when wet.
> Reading disclaimers makes you go blind. Writing them is worse. You have been
> Warned.
>

Re: Sorting performance

Posted by Norberto Meijome <nu...@gmail.com>.

On Mon, 20 Oct 2008 16:28:23 +0300
christophe <ch...@lemoine-fr.com> wrote:

> Hum..... this mean I have to wait before I index new documents and avoid 
> indexing when they are created (I have about 50 000 new documents 
> created each day and I was planning to make those searchable ASAP).

you can always index + optimize out of band in a 'master' / RW server , and
then send the updated index to your slave (the one actually serving the
requests). 

This *will NOT* remove the need to refresh your cache, but it will remove any
delay introduced by commit/indexing + optimise.

> Too bad there is no way to have a centralized cache that can be shared 
> AND updated when new documents are created.

hmm not sure it makes sense like that... but maybe along the lines of having an
active cache that is used to serve queries, and new ones being prepared, and
then swapped when ready. 

Speaking of which (or not :P) , has anyone thought about / done any work on
using memcached for these internal solr caches? I guess it would make sense for
setups with several slaves ( or even a master updating memcached
too...)...though for a setup with shards it would be slightly more involved
(although it *could* be used to support several slaves per 'data shard' ).

All the best,
B
_________________________
{Beto|Norberto|Numard} Meijome

RTFM and STFW before anything bad happens.

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.

Re: Sorting performance

Posted by christophe <ch...@lemoine-fr.com>.

Hum..... this mean I have to wait before I index new documents and avoid 
indexing when they are created (I have about 50 000 new documents 
created each day and I was planning to make those searchable ASAP).
Too bad there is no way to have a centralized cache that can be shared 
AND updated when new documents are created.

C.

Mark Miller wrote:
> christophe wrote:
>> When I start indexing new documents, searches are taking long time 
>> again: is the sort cache flushed when new documents are indexed ?
>
> When you commit, a new Reader will be opened (or reopened) so that the 
> freshly added docs can be seen. This would make the first search slow 
> again, but if you have the warming queries, it should be warmed before 
> being put into use. Be sure the warming query sorts on the right field.
>
>>
>> Are there any metrics on how to compute memory requirements (based on 
>> doc average size, number of sorted fields, number of indexed 
>> documents + number of new document / day) ?
>
> Depends on the field type, but I think its 32bits x numDocs for most 
> datatypes, with the String datatype also requiring an array of all the 
> unique terms to index into. Thats not everything, but it dominates.
>
>
>> Thanks
>> Christophe
>> Mark Miller wrote:
>>> You need to setup a warming query that sorts so that the initial 
>>> long query is done behind the scenes. Users first query will then be 
>>> fast. Solrconfig.
>>>
>>> - Mark
>>>
>>>
>>> On Oct 18, 2008, at 1:34 AM, christophe <ch...@lemoine-fr.com> 
>>> wrote:
>>>
>>>> Here are the memory parameters I'm using now(Tomcat): -Xms2024m 
>>>> -Xmx2024m
>>>> With those values, the second query is way faster. Only the first 
>>>> one is very slow.
>>>> Thanks for the tip.
>>>> However, I'm wondering if will be enough and I will not hit the 
>>>> same issues when I will have many users searching at the same time: 
>>>> I will do a stress test to check this.
>>>>
>>>> Thanks
>>>> Christophe
>>>>
>>>> christophe wrote:
>>>>> It is slow each time I run it. (I test it from the Solr admin 
>>>>> console or from a JAVA program using the Http client).
>>>>> I do not get the OOM each time.
>>>>>
>>>>> Thx
>>>>> Christophe
>>>>>
>>>>> Otis Gospodnetic wrote:
>>>>>> Is the sorted query slow only the first time or every time you 
>>>>>> run it?
>>>>>>
>>>>>> You got an OOM?  What -Xmx value are you using?  Try increasing it.
>>>>>>
>>>>>> Otis
>>>>>> -- 
>>>>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>>>>>
>>>>>>
>>>>>>
>>>>>> ----- Original Message ----
>>>>>>
>>>>>>> From: christophe <ch...@lemoine-fr.com>
>>>>>>> To: solr-user@lucene.apache.org
>>>>>>> Sent: Friday, October 17, 2008 1:28:52 PM
>>>>>>> Subject: Sorting performance
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm doing some tests with Solr1.3
>>>>>>> I have loaded around 7M documents, each with a few stored and 
>>>>>>> indexed fields.
>>>>>>>
>>>>>>> This query: text:sometext returns the results, sorted by score 
>>>>>>> in a few milliseconds. (I display 10 out of 8747 matched documents)
>>>>>>> This one: text:sometext;id desc   takes something like 60s or 
>>>>>>> more to return the data (when it doesn't fails with an out of 
>>>>>>> memory error). (id is a string type).
>>>>>>> I have tried to display only id, same results.
>>>>>>>
>>>>>>> Any ideas ? I'm sure I'm doing something wrong.....
>>>>>>>
>>>>>>> My schema is based on the sample, with the following fields:
>>>>>>>
>>>>>>>  />           multiValued="true" />
>>>>>>>  default="NOW" multiValued="false"/>
>>>>>>>
>>>>>>> Thanks
>>>>>>> Christophe
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>
>

Re: Sorting performance

Posted by Mark Miller <ma...@gmail.com>.

christophe wrote:
> When I start indexing new documents, searches are taking long time 
> again: is the sort cache flushed when new documents are indexed ?

When you commit, a new Reader will be opened (or reopened) so that the 
freshly added docs can be seen. This would make the first search slow 
again, but if you have the warming queries, it should be warmed before 
being put into use. Be sure the warming query sorts on the right field.

>
> Are there any metrics on how to compute memory requirements (based on 
> doc average size, number of sorted fields, number of indexed documents 
> + number of new document / day) ?

Depends on the field type, but I think its 32bits x numDocs for most 
datatypes, with the String datatype also requiring an array of all the 
unique terms to index into. Thats not everything, but it dominates.


> Thanks
> Christophe
> Mark Miller wrote:
>> You need to setup a warming query that sorts so that the initial long 
>> query is done behind the scenes. Users first query will then be fast. 
>> Solrconfig.
>>
>> - Mark
>>
>>
>> On Oct 18, 2008, at 1:34 AM, christophe <ch...@lemoine-fr.com> 
>> wrote:
>>
>>> Here are the memory parameters I'm using now(Tomcat): -Xms2024m 
>>> -Xmx2024m
>>> With those values, the second query is way faster. Only the first 
>>> one is very slow.
>>> Thanks for the tip.
>>> However, I'm wondering if will be enough and I will not hit the same 
>>> issues when I will have many users searching at the same time: I 
>>> will do a stress test to check this.
>>>
>>> Thanks
>>> Christophe
>>>
>>> christophe wrote:
>>>> It is slow each time I run it. (I test it from the Solr admin 
>>>> console or from a JAVA program using the Http client).
>>>> I do not get the OOM each time.
>>>>
>>>> Thx
>>>> Christophe
>>>>
>>>> Otis Gospodnetic wrote:
>>>>> Is the sorted query slow only the first time or every time you run 
>>>>> it?
>>>>>
>>>>> You got an OOM?  What -Xmx value are you using?  Try increasing it.
>>>>>
>>>>> Otis
>>>>> -- 
>>>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>>>>
>>>>>
>>>>>
>>>>> ----- Original Message ----
>>>>>
>>>>>> From: christophe <ch...@lemoine-fr.com>
>>>>>> To: solr-user@lucene.apache.org
>>>>>> Sent: Friday, October 17, 2008 1:28:52 PM
>>>>>> Subject: Sorting performance
>>>>>> Hi,
>>>>>>
>>>>>> I'm doing some tests with Solr1.3
>>>>>> I have loaded around 7M documents, each with a few stored and 
>>>>>> indexed fields.
>>>>>>
>>>>>> This query: text:sometext returns the results, sorted by score in 
>>>>>> a few milliseconds. (I display 10 out of 8747 matched documents)
>>>>>> This one: text:sometext;id desc   takes something like 60s or 
>>>>>> more to return the data (when it doesn't fails with an out of 
>>>>>> memory error). (id is a string type).
>>>>>> I have tried to display only id, same results.
>>>>>>
>>>>>> Any ideas ? I'm sure I'm doing something wrong.....
>>>>>>
>>>>>> My schema is based on the sample, with the following fields:
>>>>>>
>>>>>>  />           multiValued="true" />
>>>>>>  default="NOW" multiValued="false"/>
>>>>>>
>>>>>> Thanks
>>>>>> Christophe
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>

query parsing issue + behavior as OR (solr 1.4-dev)

Posted by Sunil Sarje <su...@yahoo.com>.

I am working with nightly build of Oct 17, 2008  and found the issue that something wrong with LuceneQParserPlugin; It takes + as OR

e.g. q=first_name:joe+last_name:smith is behaving as OR instead of AND.
Default operator is set to AND in schema.xml
<solrQueryParser defaultOperator="AND"/>


Is there any new configuration I need to put in place in order to get this working ?

Thanks
-Sunil

Re: Sorting performance

Posted by christophe <ch...@lemoine-fr.com>.

When I start indexing new documents, searches are taking long time 
again: is the sort cache flushed when new documents are indexed ?

Thanks
Christophe

Mark Miller wrote:
> You need to setup a warming query that sorts so that the initial long 
> query is done behind the scenes. Users first query will then be fast. 
> Solrconfig.
>
> - Mark
>
>
> On Oct 18, 2008, at 1:34 AM, christophe <ch...@lemoine-fr.com> 
> wrote:
>
>> Here are the memory parameters I'm using now(Tomcat): -Xms2024m 
>> -Xmx2024m
>> With those values, the second query is way faster. Only the first one 
>> is very slow.
>> Thanks for the tip.
>> However, I'm wondering if will be enough and I will not hit the same 
>> issues when I will have many users searching at the same time: I will 
>> do a stress test to check this.
>>
>> Thanks
>> Christophe
>>
>> christophe wrote:
>>> It is slow each time I run it. (I test it from the Solr admin 
>>> console or from a JAVA program using the Http client).
>>> I do not get the OOM each time.
>>>
>>> Thx
>>> Christophe
>>>
>>> Otis Gospodnetic wrote:
>>>> Is the sorted query slow only the first time or every time you run it?
>>>>
>>>> You got an OOM?  What -Xmx value are you using?  Try increasing it.
>>>>
>>>> Otis
>>>> -- 
>>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>>>
>>>>
>>>>
>>>> ----- Original Message ----
>>>>
>>>>> From: christophe <ch...@lemoine-fr.com>
>>>>> To: solr-user@lucene.apache.org
>>>>> Sent: Friday, October 17, 2008 1:28:52 PM
>>>>> Subject: Sorting performance
>>>>> Hi,
>>>>>
>>>>> I'm doing some tests with Solr1.3
>>>>> I have loaded around 7M documents, each with a few stored and 
>>>>> indexed fields.
>>>>>
>>>>> This query: text:sometext returns the results, sorted by score in 
>>>>> a few milliseconds. (I display 10 out of 8747 matched documents)
>>>>> This one: text:sometext;id desc   takes something like 60s or more 
>>>>> to return the data (when it doesn't fails with an out of memory 
>>>>> error). (id is a string type).
>>>>> I have tried to display only id, same results.
>>>>>
>>>>> Any ideas ? I'm sure I'm doing something wrong.....
>>>>>
>>>>> My schema is based on the sample, with the following fields:
>>>>>
>>>>>  />           multiValued="true" />
>>>>>  default="NOW" multiValued="false"/>
>>>>>
>>>>> Thanks
>>>>> Christophe
>>>>>
>>>>
>>>>
>>>
>>

Re: Sorting performance

Posted by christophe <ch...@lemoine-fr.com>.

Will do so. Thanks.
Are there any metrics on how to compute memory requirements (based on 
doc average size, number of sorted fields, number of indexed documents + 
number of new document / day) ?

Thanks
Christophe


Mark Miller wrote:
> You need to setup a warming query that sorts so that the initial long 
> query is done behind the scenes. Users first query will then be fast. 
> Solrconfig.
>
> - Mark
>
>
> On Oct 18, 2008, at 1:34 AM, christophe <ch...@lemoine-fr.com> 
> wrote:
>
>> Here are the memory parameters I'm using now(Tomcat): -Xms2024m 
>> -Xmx2024m
>> With those values, the second query is way faster. Only the first one 
>> is very slow.
>> Thanks for the tip.
>> However, I'm wondering if will be enough and I will not hit the same 
>> issues when I will have many users searching at the same time: I will 
>> do a stress test to check this.
>>
>> Thanks
>> Christophe
>>
>> christophe wrote:
>>> It is slow each time I run it. (I test it from the Solr admin 
>>> console or from a JAVA program using the Http client).
>>> I do not get the OOM each time.
>>>
>>> Thx
>>> Christophe
>>>
>>> Otis Gospodnetic wrote:
>>>> Is the sorted query slow only the first time or every time you run it?
>>>>
>>>> You got an OOM?  What -Xmx value are you using?  Try increasing it.
>>>>
>>>> Otis
>>>> -- 
>>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>>>
>>>>
>>>>
>>>> ----- Original Message ----
>>>>
>>>>> From: christophe <ch...@lemoine-fr.com>
>>>>> To: solr-user@lucene.apache.org
>>>>> Sent: Friday, October 17, 2008 1:28:52 PM
>>>>> Subject: Sorting performance
>>>>> Hi,
>>>>>
>>>>> I'm doing some tests with Solr1.3
>>>>> I have loaded around 7M documents, each with a few stored and 
>>>>> indexed fields.
>>>>>
>>>>> This query: text:sometext returns the results, sorted by score in 
>>>>> a few milliseconds. (I display 10 out of 8747 matched documents)
>>>>> This one: text:sometext;id desc   takes something like 60s or more 
>>>>> to return the data (when it doesn't fails with an out of memory 
>>>>> error). (id is a string type).
>>>>> I have tried to display only id, same results.
>>>>>
>>>>> Any ideas ? I'm sure I'm doing something wrong.....
>>>>>
>>>>> My schema is based on the sample, with the following fields:
>>>>>
>>>>>  />           multiValued="true" />
>>>>>  default="NOW" multiValued="false"/>
>>>>>
>>>>> Thanks
>>>>> Christophe
>>>>>
>>>>
>>>>
>>>
>>

Re: Sorting performance

Posted by Mark Miller <ma...@gmail.com>.

You need to setup a warming query that sorts so that the initial long  
query is done behind the scenes. Users first query will then be fast.  
Solrconfig.

- Mark


On Oct 18, 2008, at 1:34 AM, christophe <ch...@lemoine-fr.com>  
wrote:

> Here are the memory parameters I'm using now(Tomcat): -Xms2024m - 
> Xmx2024m
> With those values, the second query is way faster. Only the first  
> one is very slow.
> Thanks for the tip.
> However, I'm wondering if will be enough and I will not hit the same  
> issues when I will have many users searching at the same time: I  
> will do a stress test to check this.
>
> Thanks
> Christophe
>
> christophe wrote:
>> It is slow each time I run it. (I test it from the Solr admin  
>> console or from a JAVA program using the Http client).
>> I do not get the OOM each time.
>>
>> Thx
>> Christophe
>>
>> Otis Gospodnetic wrote:
>>> Is the sorted query slow only the first time or every time you run  
>>> it?
>>>
>>> You got an OOM?  What -Xmx value are you using?  Try increasing it.
>>>
>>> Otis
>>> -- 
>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>>
>>>
>>>
>>> ----- Original Message ----
>>>
>>>> From: christophe <ch...@lemoine-fr.com>
>>>> To: solr-user@lucene.apache.org
>>>> Sent: Friday, October 17, 2008 1:28:52 PM
>>>> Subject: Sorting performance
>>>> Hi,
>>>>
>>>> I'm doing some tests with Solr1.3
>>>> I have loaded around 7M documents, each with a few stored and  
>>>> indexed fields.
>>>>
>>>> This query: text:sometext returns the results, sorted by score in  
>>>> a few milliseconds. (I display 10 out of 8747 matched documents)
>>>> This one: text:sometext;id desc   takes something like 60s or  
>>>> more to return the data (when it doesn't fails with an out of  
>>>> memory error). (id is a string type).
>>>> I have tried to display only id, same results.
>>>>
>>>> Any ideas ? I'm sure I'm doing something wrong.....
>>>>
>>>> My schema is based on the sample, with the following fields:
>>>>
>>>>  />           multiValued="true" />
>>>>  default="NOW" multiValued="false"/>
>>>>
>>>> Thanks
>>>> Christophe
>>>>
>>>
>>>
>>
>

Re: Sorting performance

Posted by christophe <ch...@lemoine-fr.com>.

Here are the memory parameters I'm using now(Tomcat): -Xms2024m -Xmx2024m
With those values, the second query is way faster. Only the first one is 
very slow.
Thanks for the tip.
However, I'm wondering if will be enough and I will not hit the same 
issues when I will have many users searching at the same time: I will do 
a stress test to check this.

Thanks
Christophe

christophe wrote:
> It is slow each time I run it. (I test it from the Solr admin console 
> or from a JAVA program using the Http client).
> I do not get the OOM each time.
>
> Thx
> Christophe
>
> Otis Gospodnetic wrote:
>> Is the sorted query slow only the first time or every time you run it?
>>
>> You got an OOM?  What -Xmx value are you using?  Try increasing it.
>>
>> Otis
>> -- 
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>
>>
>>
>> ----- Original Message ----
>>  
>>> From: christophe <ch...@lemoine-fr.com>
>>> To: solr-user@lucene.apache.org
>>> Sent: Friday, October 17, 2008 1:28:52 PM
>>> Subject: Sorting performance
>>> Hi,
>>>
>>> I'm doing some tests with Solr1.3
>>> I have loaded around 7M documents, each with a few stored and 
>>> indexed fields.
>>>
>>> This query: text:sometext returns the results, sorted by score in a 
>>> few milliseconds. (I display 10 out of 8747 matched documents)
>>> This one: text:sometext;id desc   takes something like 60s or more 
>>> to return the data (when it doesn't fails with an out of memory 
>>> error). (id is a string type).
>>> I have tried to display only id, same results.
>>>
>>> Any ideas ? I'm sure I'm doing something wrong.....
>>>
>>> My schema is based on the sample, with the following fields:
>>>
>>>   />           multiValued="true" />
>>>   default="NOW" multiValued="false"/>
>>>  
>>>
>>> Thanks
>>> Christophe
>>>     
>>
>>   
>

Re: Sorting performance

Posted by christophe <ch...@lemoine-fr.com>.

It is slow each time I run it. (I test it from the Solr admin console or 
from a JAVA program using the Http client).
I do not get the OOM each time.

Thx
Christophe

Otis Gospodnetic wrote:
> Is the sorted query slow only the first time or every time you run it?
>
> You got an OOM?  What -Xmx value are you using?  Try increasing it.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
>   
>> From: christophe <ch...@lemoine-fr.com>
>> To: solr-user@lucene.apache.org
>> Sent: Friday, October 17, 2008 1:28:52 PM
>> Subject: Sorting performance 
>>
>> Hi,
>>
>> I'm doing some tests with Solr1.3
>> I have loaded around 7M documents, each with a few stored and indexed 
>> fields.
>>
>> This query: text:sometext returns the results, sorted by score in a few 
>> milliseconds. (I display 10 out of 8747 matched documents)
>> This one: text:sometext;id desc   takes something like 60s or more to 
>> return the data (when it doesn't fails with an out of memory error). (id 
>> is a string type).
>> I have tried to display only id, same results.
>>
>> Any ideas ? I'm sure I'm doing something wrong.....
>>
>> My schema is based on the sample, with the following fields:
>>
>>   
>> /> 
>>   
>>   
>>   
>>   
>>   
>> multiValued="true" />
>>   
>> default="NOW" multiValued="false"/>
>>   
>>
>>
>> Thanks
>> Christophe
>>     
>
>