You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Daniel Exner <da...@esemos.de> on 2012/11/29 11:15:59 UTC

Benchmarking Solr 3.3 vs. 4.0

Hi Solr community,

I'm currently doing some benchmarking of a real Solr 3.3 instance vs the 
same ported to Solr 4.0.

Benchmarking is done using JMeter from localhost.
Test scenario is a constant stream of queries from a log file out of 
production, at targeted 50 QPS.
After some time (marked in graph) I do a push via REST interface of the 
whole index data (796M XML), wait some time and do a optimize via REST.

Testmachine is a VM on a "Intel(R) Core(TM)2 Quad CPU Q9400 @2.66GH", 
one core and 2Gb RAM attached.
Both Solr instances are running in the same Tomcat and are not used 
otherwise than testing.

Expected results where a lower overall load for Solr 4 and a lower 
latency while pushing new data.

In the graph you can see high CPU load, all the time. This is even the 
case if I reduce the QPS down to 5, so CPU is no good metric for 
comparison between Solr 3.3 and 4.0 (at least on this machine).
The missing memory data is due to the PerfMon JMeter Plugin having 
time-outs sometimes.

You can also see no real increase in latency when pushing data into the 
index. This is puzzling me, as rumours say one should not push new data 
while under high load, as this would hurt query performance.

Anyone did similar tests before and may comment on that?

Greetings
Daniel Exner
-- 
Daniel Exner
Softwaredevelopment & Applicatiosupport
ESEMOS GmbH

Re: Benchmarking Solr 3.3 vs. 4.0

Posted by Daniel Exner <da...@esemos.de>.

Shawn Heisey wrote:
[..]
>
> For best results, you'll want to ensure that Solr4 is working completely
> from scratch, that it has never seen a 3.3 index, so that it will use
> its own native format.
That's why I did in the second run. Thanks for clarifying that this is 
in fact better. :)

> It may be a good idea to look into the example
> Solr4 config/schema and see whether there are improvements you can
> make.  One note: the updateLog feature in the update handler config will
> generally cause performance to be lower.  The features that require
> updateLog would make this less of an apples to apples comparison, so I
> wouldn't enable it unless I knew I needed it.
I'll have a look at the updateLog feature. But I'm pretty sure its disabled.

> Unless the lines are labelled wrong in the legend, the graph does show
> higher CPU usage during the push, but lower CPU usage during the
> optimize and most of the rest of the time.
Slightly, but I was expecting higher latency also. Also raw data shows 
the box is unable to deliver CPU stats to the PerfMon Plugin because of 
high load. Perhaps I was expecting higher changes, but if you say what I 
see is ok, I'm fine.
Can you comment on high CPU load even at low QPS rates?
Is there some parameter to force lower load while testing at the cost of 
higher latencies for better comparison?

>
> The graph shows that Solr4 has lower latency than 3.3 during both the
> push and the optimize, as well as most of the rest of the time.  The
> latency numbers however are a lot higher than I would expect, seeming to
> average out at around 100 seconds (100000 ms).  That is terrible
> performance from both versions.  On my own Solr installation, which is
> distributed and has 78 million documents, I have a median latency of 8
> milliseconds and a 95th percentile latency of 248 milliseconds.
OK, I should relabel the y-axis because data is in fact 1000 times to 
high. So latency is more at 10ms which is quite good for high QPS rates.


> Is this a 64-bit platform with a 64-bit Java?  How much memory have you
> allocated for the java heap?  How big is the index?

The VM I am using is an openSUSE 10.3 (i586), so no 64-bit Java at all 
(but production is using it).
Tomcat Java parameters are:
"-Xms1024m -Xmx1024m -XX:PermSize=256m -XX:MaxPermSize=256m 
-XX:+UseParallelGC -XX:ParallelGCThreads=4 -XX:GCTimeRatio=10"

Number of docs is 266249 for both indices. Which is quite small, but I 
may be able to use a much larger index and a much better machine in the 
near future.

Greetings
Daniel Exner
--
Daniel Exner
Softwaredevelopment & Applicationsupport
ESEMOS GmbH

Re: Benchmarking Solr 3.3 vs. 4.0

Posted by Shawn Heisey <so...@elyograg.org>.

On 11/29/2012 8:29 AM, Daniel Exner wrote:
> I'll answer both your mails in one.
>
> Shawn Heisey wrote:
>> On 11/29/2012 3:15 AM, Daniel Exner wrote:
>>> I'm currently doing some benchmarking of a real Solr 3.3 instance vs
>>> the same ported to Solr 4.0.
> [..]
>>> In the graph you can see high CPU load, all the time. This is even the
>>> case if I reduce the QPS down to 5, so CPU is no good metric for
>>> comparison between Solr 3.3 and 4.0 (at least on this machine).
>>> The missing memory data is due to the PerfMon JMeter Plugin having
>>> time-outs sometimes.
>>>
>>> You can also see no real increase in latency when pushing data into
>>> the index. This is puzzling me, as rumours say one should not push new
>>> data while under high load, as this would hurt query performance.
>>
>> I don't see any attachments, or any links to external attachments, so I
>> can't see the graph.  I can only make general statements, and I can't
>> guarantee that they'll even be applicable to your scenario.  You may
>> need to use an external attachment service and just send us a link.
> Indeed, it seems like the mailing list daemon scrubbed my attachement. 
> I dropped it into my Dropbox, here http://db.tt/EjYCqbpn
>
>> Are you seeing lower performance, or just worried about the CPU load?
>> Solr4 should be able to handle concurrent indexing and querying better
>> than 3.x.  It is able to do things concurrently that were not possible
>> before.
> In general I'm interested in how much better Solr 4 performs and if it 
> may be feasonable to use less powerful machines to get the same low 
> latency, or do more data pushes etc.
>
>> One way that performance improvements happen is that developers find
>> slow sections of code where the CPU is fairly idle, and rewrite them so
>> they are faster, but also exercise the CPU harder. When the new code
>> runs, CPU load goes higher, but it all runs faster.
> Graphs show a slightly better latency for Solr 4.0 compared to 3.3, 
> but not while pushing data.
>
>
>> Another note specifically related to this part: Have you used the same
>> configuration and done the minimal changes required to make it run, or
>> have you tried to update the config for 4.0 and its considerable list of
>> new features?  Did you start with a blank index on 4.0, or did you copy
>> the 3.3 index over?
> I used the same configuration and did the minimal changes.
> The first runs where using the same data from Solr 3.3 in Solr 4.0 (in 
> fact it was even the same data dir..) but further runs used freshly 
> filled different indices.

For best results, you'll want to ensure that Solr4 is working completely 
from scratch, that it has never seen a 3.3 index, so that it will use 
its own native format.  It may be a good idea to look into the example 
Solr4 config/schema and see whether there are improvements you can 
make.  One note: the updateLog feature in the update handler config will 
generally cause performance to be lower.  The features that require 
updateLog would make this less of an apples to apples comparison, so I 
wouldn't enable it unless I knew I needed it.

Unless the lines are labelled wrong in the legend, the graph does show 
higher CPU usage during the push, but lower CPU usage during the 
optimize and most of the rest of the time.

The graph shows that Solr4 has lower latency than 3.3 during both the 
push and the optimize, as well as most of the rest of the time.  The 
latency numbers however are a lot higher than I would expect, seeming to 
average out at around 100 seconds (100000 ms).  That is terrible 
performance from both versions.  On my own Solr installation, which is 
distributed and has 78 million documents, I have a median latency of 8 
milliseconds and a 95th percentile latency of 248 milliseconds.

Is this a 64-bit platform with a 64-bit Java?  How much memory have you 
allocated for the java heap?  How big is the index?

Thanks,
Shawn

Re: Benchmarking Solr 3.3 vs. 4.0

Posted by Daniel Exner <da...@esemos.de>.

I'll answer both your mails in one.

Shawn Heisey wrote:
> On 11/29/2012 3:15 AM, Daniel Exner wrote:
>> I'm currently doing some benchmarking of a real Solr 3.3 instance vs
>> the same ported to Solr 4.0.
[..]
>> In the graph you can see high CPU load, all the time. This is even the
>> case if I reduce the QPS down to 5, so CPU is no good metric for
>> comparison between Solr 3.3 and 4.0 (at least on this machine).
>> The missing memory data is due to the PerfMon JMeter Plugin having
>> time-outs sometimes.
>>
>> You can also see no real increase in latency when pushing data into
>> the index. This is puzzling me, as rumours say one should not push new
>> data while under high load, as this would hurt query performance.
>
> I don't see any attachments, or any links to external attachments, so I
> can't see the graph.  I can only make general statements, and I can't
> guarantee that they'll even be applicable to your scenario.  You may
> need to use an external attachment service and just send us a link.
Indeed, it seems like the mailing list daemon scrubbed my attachement. I 
dropped it into my Dropbox, here http://db.tt/EjYCqbpn

> Are you seeing lower performance, or just worried about the CPU load?
> Solr4 should be able to handle concurrent indexing and querying better
> than 3.x.  It is able to do things concurrently that were not possible
> before.
In general I'm interested in how much better Solr 4 performs and if it 
may be feasonable to use less powerful machines to get the same low 
latency, or do more data pushes etc.

> One way that performance improvements happen is that developers find
> slow sections of code where the CPU is fairly idle, and rewrite them so
> they are faster, but also exercise the CPU harder.  When the new code
> runs, CPU load goes higher, but it all runs faster.
Graphs show a slightly better latency for Solr 4.0 compared to 3.3, but 
not while pushing data.


> Another note specifically related to this part: Have you used the same
> configuration and done the minimal changes required to make it run, or
> have you tried to update the config for 4.0 and its considerable list of
> new features?  Did you start with a blank index on 4.0, or did you copy
> the 3.3 index over?
I used the same configuration and did the minimal changes.
The first runs where using the same data from Solr 3.3 in Solr 4.0 (in 
fact it was even the same data dir..) but further runs used freshly 
filled different indices.


Greetings
Daniel Exner
--
Daniel Exner
Softwaredevelopment & Applicationsupport
ESEMOS GmbH

Re: Benchmarking Solr 3.3 vs. 4.0

Posted by Shawn Heisey <so...@elyograg.org>.

On 11/29/2012 3:15 AM, Daniel Exner wrote:
> I'm currently doing some benchmarking of a real Solr 3.3 instance vs 
> the same ported to Solr 4.0.
>
> Benchmarking is done using JMeter from localhost.
> Test scenario is a constant stream of queries from a log file out of 
> production, at targeted 50 QPS.
> After some time (marked in graph) I do a push via REST interface of 
> the whole index data (796M XML), wait some time and do a optimize via 
> REST.
>
> Testmachine is a VM on a "Intel(R) Core(TM)2 Quad CPU Q9400 @2.66GH", 
> one core and 2Gb RAM attached.
> Both Solr instances are running in the same Tomcat and are not used 
> otherwise than testing.
>
> Expected results where a lower overall load for Solr 4 and a lower 
> latency while pushing new data.
>
> In the graph you can see high CPU load, all the time. This is even the 
> case if I reduce the QPS down to 5, so CPU is no good metric for 
> comparison between Solr 3.3 and 4.0 (at least on this machine).
> The missing memory data is due to the PerfMon JMeter Plugin having 
> time-outs sometimes.
>
> You can also see no real increase in latency when pushing data into 
> the index. This is puzzling me, as rumours say one should not push new 
> data while under high load, as this would hurt query performance.

I don't see any attachments, or any links to external attachments, so I 
can't see the graph.  I can only make general statements, and I can't 
guarantee that they'll even be applicable to your scenario.  You may 
need to use an external attachment service and just send us a link.

Are you seeing lower performance, or just worried about the CPU load?  
Solr4 should be able to handle concurrent indexing and querying better 
than 3.x.  It is able to do things concurrently that were not possible 
before.

One way that performance improvements happen is that developers find 
slow sections of code where the CPU is fairly idle, and rewrite them so 
they are faster, but also exercise the CPU harder.  When the new code 
runs, CPU load goes higher, but it all runs faster.

Thanks,
Shawn

Re: Benchmarking Solr 3.3 vs. 4.0

Posted by Shawn Heisey <so...@elyograg.org>.

On 11/29/2012 3:15 AM, Daniel Exner wrote:
> I'm currently doing some benchmarking of a real Solr 3.3 instance vs 
> the same ported to Solr 4.0.

Another note specifically related to this part: Have you used the same 
configuration and done the minimal changes required to make it run, or 
have you tried to update the config for 4.0 and its considerable list of 
new features?  Did you start with a blank index on 4.0, or did you copy 
the 3.3 index over?

There's no wrong answer to these questions.  Depending on exactly what 
you are trying to do, what is right for someone else may not be right 
for you.  The answers will help narrow the discussion.

Thanks,
Shawn