You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Anca Kopetz <an...@kelkoo.com> on 2013/08/05 10:42:40 UTC

"optimize" index : impact on performance [Republished]

Hi,

[I am sending again my message to the mailing list, as well as Shawn's reply. Thanks Shawn for your explanations]

We are trying to improve the performance of our Solr Search application in terms of QPS (queries per second).

We tuned SOLR settings (e.g. mergeFactor=3), launched several benchmarks and had better performance results, but still unsatisfactory for our traffic volume.
Then we optimized the index to 1 segment / 0 deleted docs and we got +40% of QPS compared to the previous test.

Therefore we thought of optimizing the index every two hours, as our index is evolving due to frequent commits (every 30 minutes) and thus the performance results are degrading.

1. Is this a good practice ?
2. Instead of executing an "optimize" many times a day, are there any other parameters that we can tune and test in order to gain in average QPS?

We want to avoid the solution of adding more servers to our SolrCloud cluster.

Some details of our system :

SolrCloud cluster: 8 nodes on 8 dedicated servers; 2 shards / 4 replicas
Hardware configuration: 2 Processors (16CPU cores) per server; 24GB of memory; 6GB allocated to JVM
Index: 13M documents, 15GB
Search algorithm : grouping, faceting, filter queries
Solr version 4.4



Please read and follow this note about thread hijacking:

http://people.apache.org/~hossman/#threadhijack<http://people.apache.org/%7Ehossman/#threadhijack>

Optimizing that frequently with an index that large *might* cause more
problems than it solves.  You'd have to actually try it to see whether
it works for you, though.  Here's some information explaining why it may
be a problem:

Optimizing a 15GB index is likely to take up to 15 minutes, depending on
how fast the I/O subsystem on your servers is.  It probably won't happen
in less than 5 minutes unless you're running on SSD, which also
mitigates some of the impact described in the next paragraph.

Performance will be lower, potentially a LOT lower, for those few
minutes while an optimize is occurring.  Solr has to read the index,
process each document, and write it back out.  It does happen quite
fast, but that's a lot of I/O.  Because it's continually going back and
forth between the old copy and the new copy, the OS disk cache will have
critical data evicted for the entire process, unless you have enough
free RAM so *twice* the index can fit in the cache, and from your
mentioned stats, you don't.

FYI, commits every 30 minutes are NOT frequent.  Commits happening one
or more times every *second* are frequent.

If you can share your solrconfig.xml, there might be some suggestions we
can make so things will generally work better.  The list doesn't accept
attachments.  It's better if you use a paste website like
http://www.fpaste.org/, choose the proper language for highlighting, and
set the "delete after" setting to something that will work for you.
Making it a paste that never gets deleted will mean that your message
will retain usefulness for others as long as archives exist, but you
might not want it available that long.

Properly tuning your garbage collection is important.  The default
garbage collector is, risking a pun, garbage.

http://wiki.apache.org/solr/SolrPerformanceProblems#GC_pause_problems
http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

Thanks,
Shawn




________________________________
Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.

Re: "optimize" index : impact on performance [Republished]

Posted by Anca Kopetz <an...@kelkoo.com>.
Hi,

We already did some benchmarks during optimize and we haven't noticed a big impact on overall performance of search. The benchmarks' results were almost the same with vs. without running optimization. We have enough free RAM for the two OS disk caches during optimize (15 GB represents the total size of our index, which is split into 2 shards, therefore there are 7.5GB of index per server that has 24GB of memory). But we will launch them again, maybe we missed smth.

Here you can fing our solconfig.xml file http://fpaste.org/30154/.

Thanks for the urls on GC tunning, we will test them.

Best regards,
Anca

On 08/05/2013 10:42 AM, Anca Kopetz wrote:

Please read and follow this note about thread hijacking:

http://people.apache.org/~hossman/#threadhijack<http://people.apache.org/%7Ehossman/#threadhijack><http://people.apache.org/%7Ehossman/#threadhijack>

Optimizing that frequently with an index that large *might* cause more
problems than it solves.  You'd have to actually try it to see whether
it works for you, though.  Here's some information explaining why it may
be a problem:

Optimizing a 15GB index is likely to take up to 15 minutes, depending on
how fast the I/O subsystem on your servers is.  It probably won't happen
in less than 5 minutes unless you're running on SSD, which also
mitigates some of the impact described in the next paragraph.

Performance will be lower, potentially a LOT lower, for those few
minutes while an optimize is occurring.  Solr has to read the index,
process each document, and write it back out.  It does happen quite
fast, but that's a lot of I/O.  Because it's continually going back and
forth between the old copy and the new copy, the OS disk cache will have
critical data evicted for the entire process, unless you have enough
free RAM so *twice* the index can fit in the cache, and from your
mentioned stats, you don't.

FYI, commits every 30 minutes are NOT frequent.  Commits happening one
or more times every *second* are frequent.

If you can share your solrconfig.xml, there might be some suggestions we
can make so things will generally work better.  The list doesn't accept
attachments.  It's better if you use a paste website like
http://www.fpaste.org/, choose the proper language for highlighting, and
set the "delete after" setting to something that will work for you.
Making it a paste that never gets deleted will mean that your message
will retain usefulness for others as long as archives exist, but you
might not want it available that long.

Properly tuning your garbage collection is important.  The default
garbage collector is, risking a pun, garbage.

http://wiki.apache.org/solr/SolrPerformanceProblems#GC_pause_problems
http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

Thanks,
Shawn




________________________________
Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.


________________________________
Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.