You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Eric Katherman <ka...@gmail.com> on 2013/11/12 17:26:41 UTC

optimization suggstions

Stats:
default config for 4.3.1 on a high memory AWS instance using jetty.
Two collections each with less than 700k docs per collection.

We seem to hit some performance lags when doing large commits.  Our front end service allows customers to import data which is stored in Mongo and then indexed in Solr.  We keep all of that data and do one big commit at the end rather than doing commits for each record along the way.  

Would it be better to use something like autoSoftCommit and just commit each record as it comes in?  Or is the problem more about disk IO?    Are there some other "low hanging fruit" things we should consider?  The solr dashboard shows that there is still plenty of free memory during these imports so it isn't running out of memory and reverting to disk.

Thanks!
Eric

Re: optimization suggstions

Posted by Andre Bois-Crettez <an...@kelkoo.com>.

I suggest putting autoCommit at something as big as your memory allows
(eg 15 minutes) to flush the update log to disk and start merging
segments, but not yet visible on the search.
Then at the end, send an explicit <commit/> wich will both persist on
disk the remainder of indexed docs and make everything visible to search
(and rebuild caches).
http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22

This should bring a much smoother load.

With autowarm at 100% on all the caches, the query performance should
not drop too much :
http://wiki.apache.org/solr/SolrCaching
You may have to add static warming queries too.

André

On 11/12/2013 06:04 PM, Shalin Shekhar Mangar wrote:
> On Tue, Nov 12, 2013 at 9:56 PM, Eric Katherman <ka...@gmail.com> wrote:
>> Stats:
>> default config for 4.3.1 on a high memory AWS instance using jetty.
>> Two collections each with less than 700k docs per collection.
>>
>> We seem to hit some performance lags when doing large commits.  Our front end service allows customers to import data which is stored in Mongo and then indexed in Solr.  We keep all of that data and do one big commit at the end rather than doing commits for each record along the way.
> What do you mean by a performance lag? Large query times after a
> commit are to be expected. They can be managed better with good auto
> warming queries.
>
>> Would it be better to use something like autoSoftCommit and just commit each record as it comes in?  Or is the problem more about disk IO?    Are there some other "low hanging fruit" things we should consider?  The solr dashboard shows that there is still plenty of free memory during these imports so it isn't running out of memory and reverting to disk.
> Committing after every record is bound to slow down things even more.
> Batched updates are almost always better. Perhaps you need to tune
> your auto commit settings to commit in smaller batches rather than in
> one big bang at the end.
>
>> Thanks!
>> Eric
>
>
>
> --
> André Bois-Crettez
>
> Software Architect
> Search Developer
> http://www.kelkoo.com/

Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.

Re: optimization suggstions

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.

On Tue, Nov 12, 2013 at 9:56 PM, Eric Katherman <ka...@gmail.com> wrote:
> Stats:
> default config for 4.3.1 on a high memory AWS instance using jetty.
> Two collections each with less than 700k docs per collection.
>
> We seem to hit some performance lags when doing large commits.  Our front end service allows customers to import data which is stored in Mongo and then indexed in Solr.  We keep all of that data and do one big commit at the end rather than doing commits for each record along the way.

What do you mean by a performance lag? Large query times after a
commit are to be expected. They can be managed better with good auto
warming queries.

>
> Would it be better to use something like autoSoftCommit and just commit each record as it comes in?  Or is the problem more about disk IO?    Are there some other "low hanging fruit" things we should consider?  The solr dashboard shows that there is still plenty of free memory during these imports so it isn't running out of memory and reverting to disk.

Committing after every record is bound to slow down things even more.
Batched updates are almost always better. Perhaps you need to tune
your auto commit settings to commit in smaller batches rather than in
one big bang at the end.

>
> Thanks!
> Eric



-- 
Regards,
Shalin Shekhar Mangar.