You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2011/04/07 02:49:42 UTC

[Solr Wiki] Update of "SolrPerformanceData" by JayHill

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "SolrPerformanceData" page has been changed by JayHill.
http://wiki.apache.org/solr/SolrPerformanceData?action=diff&rev1=18&rev2=19

--------------------------------------------------

  See also: SolrPerformanceFactors, BenchmarkingSolr
  
  = Solr Performance Data =
- 
  Solr users are encouraged to update this page to share any information they can about how they use Solr and what kind of performance they have observed.
  
  Pleae try to give as many specifics as you can regarding:
+ 
-   * The Hardware and OS you used
+  * The Hardware and OS you used
-   * The version of Solr you used
+  * The version of Solr you used
-   * The Servlet Container and JVM you used
+  * The Servlet Container and JVM you used
-   * Your index
+  * Your index
-   * The types of operations you tested (ie: updates, commits, optimizes, searchers -- the !RequestHandler used, etc...)
+  * The types of operations you tested (ie: updates, commits, optimizes, searchers -- the !RequestHandler used, etc...)
-   * What's your greatest performance bottleneck: CPU? Disk speed? RAM?
+  * What's your greatest performance bottleneck: CPU? Disk speed? RAM?
  
  See also: SolrPerformanceFactors
  
- See also: [[http://lucene.apache.org/java/docs/benchmarks.html|Lucene's benchmark page]] and this page on hardware considerations [[http://wiki.statsbiblioteket.dk/summa/Hardware|from Summa]] (which is also based on Lucene)
+ See also: [[http://lucene.apache.org/java/2_4_0/benchmarks.html|Lucene's benchmark page]] and this page on hardware considerations [[http://wiki.statsbiblioteket.dk/summa/Hardware|from Summa]] (which is also based on Lucene)
  
  == CNET Shopper.com ==
- 
  The numbers below are from testing done by CNET prior to launching a Solr powered [[http://www.shopper.com|Shopper.com]] search page.  Shopper.com uses a modified version of the DisMaxRequestHandler which also does some faceted searching to pick categories for the page navigation options.  On a typical request, the handler fetches the !DocSets for 1500-2000 queries and intersects each with the !DocSet for the main search results.
  
- The plugin itself uses configuration nearly identical to the DisMaxRequestHandler.  To give you an idea of the types of queries that it generates: 
+ The plugin itself uses configuration nearly identical to the DisMaxRequestHandler.  To give you an idea of the types of queries that it generates:
  
   * The qf param is used to search across 10-15 fields with various boosts.
   * The pf param is used to phrase search across 10-15 fields with various boosts.
@@ -30, +29 @@

   * The bf param contains two separate boosting functions, one of which contains two nested functions.
   * The fq param is used to filter out ~15% of the records that we don't want to ever surface.
  
- The index used in these tests contained ~400K records, took up ~900MB of disk, and was fully optimized. 
+ The index used in these tests contained ~400K records, took up ~900MB of disk, and was fully optimized.
  
- During the tests, a cron job forcibly triggered a commit (even though the index hadn't changed) every 15 minutes to force a new searcher to be opened and autowarmed while the queries were being processed.  
+ During the tests, a cron job forcibly triggered a commit (even though the index hadn't changed) every 15 minutes to force a new searcher to be opened and autowarmed while the queries were being processed.
  
  Solr was running on a 2.4GHz dual opteron (DL385) w/ 16GB memory Linux (2.6.9) using Resin 3.0.??.  (I don't know the specific resin release or JVM options used)
  
@@ -40, +39 @@

  
  {{{
   Number of Concurrent Clients:   1       2       4       6
-        
+ 
-      Throughput (queries/sec):  33.9    49.2    58.2    60.1 
+      Throughput (queries/sec):  33.9    49.2    58.2    60.1
       Avg Response Time (secs):   0.030   0.041   0.069   0.100
-     
+ 
       99.9th percentile (secs):   0.456   0.695   1.015   1.418
         99th percentile (secs):   0.245   0.301   0.496   0.661
         98th percentile (secs):   0.173   0.225   0.367   0.486
         95th percentile (secs):   0.095   0.124   0.220   0.323
         75th percentile (secs):   0.027   0.040   0.072   0.108
         50th percentile (secs):   0.017   0.024   0.042   0.063
+ }}}
- }}}    
- 
  Mailing list post [[http://www.nabble.com/forum/ViewPost.jtp?post=4487784&framed=y|"Two Solr Announcements: CNET Product Search and DisMax"]] describes a little more about Solr and CNET.
  
  == Netflix ==
- 
  Walter Underwood reports that [[http://www.netflix.com|Netflix]]'s site search switched to being powered by Solr the week of 9/17/07:
  
-  Here at Netflix, we switched over our site search to Solr two weeks ago. We've seen zero problems with the server. We average 1.2 million queries/day on a 250K item index. We're running four Solr servers with simple round-robin HTTP load-sharing.
+  . Here at Netflix, we switched over our site search to Solr two weeks ago. We've seen zero problems with the server. We average 1.2 million queries/day on a 250K item index. We're running four Solr servers with simple round-robin HTTP load-sharing. This is all on 1.1. I've been too busy tuning to upgrade.
- 
-  This is all on 1.1. I've been too busy tuning to upgrade.
  
  (See http://www.nabble.com/forum/ViewPost.jtp?post=13009485&framed=y)
  
  Walter also reported some figures from their testing phase:
  
-  We are searching a much smaller collection, about 250K docs, with great success. We see 80 queries/sec on each of four servers, and response times under 100ms. Each query searches against seven fields.
+  . We are searching a much smaller collection, about 250K docs, with great success. We see 80 queries/sec on each of four servers, and response times under 100ms. Each query searches against seven fields.
  
  At least for these test figures, they were not using fuzzy search, facets, or highlighting.
  
  (See http://www.nabble.com/forum/ViewPost.jtp?post=12906462&framed=y)
  
  == Discogs.com ==
- 
- Solr powers keyword search on [[http://www.discogs.com/|Discogs.com]]. From the [[http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200611.mbox/%3c3f732c0b0611060921q1c67185fkb454a901a6abb998@mail.gmail.com%3e|email archive]] ([[http://www.nabble.com/forum/ViewPost.jtp?post=7203032&framed=y|alternate copy on nabble]])...
+ Solr powers keyword search on [[http://www.discogs.com/|Discogs.com]]. From the [[http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200611.mbox/<3f...@mail.gmail.com>|email archive]] ([[http://www.nabble.com/forum/ViewPost.jtp?post=7203032&framed=y|alternate copy on nabble]])...
  
  {{{
  I've been using Solr for keyword search on Discogs.com for a few
@@ -86, +80 @@

  queries per day with no problem. CPU load stays around 0.15 most of
  the time.
  }}}
+ == HathiTrust Large Scale Solr Benchmarking ==
+ [[http://www.hathitrust.org|HathiTrust]] ''makes the digitized collections of some of the nation’s great research libraries available for all.''  We currently have slightly over 5 million full-text books indexed.  Our production index is spread across 10 shards on 4 machines. With a total index size of over 2 Terabytes, our biggest bottleneck is disk I/O.  We did reduce that significantly using CommonGrams, but disk I/O is still the bottleneck for performance.
  
- == HathiTrust Large Scale Solr Benchmarking ==
- 
- [[http://www.hathitrust.org|HathiTrust]] ''makes the digitized collections of some of the nation’s great research libraries available for all.''  We currently have slightly over 5 million full-text books indexed.  Our production index is spread across 10 shards on 4 machines. With a total index size of over 2 Terabytes, our biggest bottleneck is disk I/O.  We did reduce that significantly using CommonGrams, but disk I/O is still the bottleneck for performance. 
- 
- On our production index, the average Solr response time is around 200 ms, median response time 90 ms, 90th percentile about 450 ms, and 99th percentile about 1.4 seconds.  Details on the hardware are available at
- [[http://www.hathitrust.org/blogs/large-scale-search/new-hardware-searching-5-million-volumes-full-text|New hardware for searching 5 million plus volumes]]  Some details on performance are available at: [[http://www.hathitrust.org/blogs/large-scale-search/performance-5-million-volumes|Performance at 5 million volumes]].  Background and updates available at:[[http://www.hathitrust.org/blogs/large-scale-search|The HathiTrust Large Scale Search blog]]  
+ On our production index, the average Solr response time is around 200 ms, median response time 90 ms, 90th percentile about 450 ms, and 99th percentile about 1.4 seconds.  Details on the hardware are available at [[http://www.hathitrust.org/blogs/large-scale-search/new-hardware-searching-5-million-volumes-full-text|New hardware for searching 5 million plus volumes]]  Some details on performance are available at: [[http://www.hathitrust.org/blogs/large-scale-search/performance-5-million-volumes|Performance at 5 million volumes]].  Background and updates available at:[[http://www.hathitrust.org/blogs/large-scale-search|The HathiTrust Large Scale Search blog]]
  
  == Zvents ==
- 
  [[http://www.zvents.com|Zvents]] serves more than 8 millions users monthly with engaging local content.  We've used Solr for several years and have achieved very high performance and reliability.  User queries are served by a cluster of 8 machines, each having 16Gigs of memory and 4 cores.  Our search index contains over 4 million documents.  An average week day sees a maximum 80qps with an average latency of 40ms.  Leading up to New Years, we'll see ten times this level.  To support huge fluctuations in our capacity needs, we run a nightly load test against a single production class machine.  The load test itself uses JMeter, a copy of production access logs, and a copy of the production index.  The load testing machine is subjected to 130qps and delivers an average latency of 150ms.