You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2014/09/19 22:54:03 UTC

[Solr Wiki] Update of "SolrPerformanceData" by TokeEskildsen

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "SolrPerformanceData" page has been changed by TokeEskildsen:
https://wiki.apache.org/solr/SolrPerformanceData?action=diff&rev1=27&rev2=28

Comment:
Added section on the Danish Web Archive

  == Zvents ==
  [[http://www.zvents.com|Zvents]] serves more than 8 millions users monthly with engaging local content.  We've used Solr for several years and have achieved very high performance and reliability.  User queries are served by a cluster of 8 machines, each having 16Gigs of memory and 4 cores.  Our search index contains over 4 million documents.  An average week day sees a maximum 80qps with an average latency of 40ms.  Leading up to New Years, we'll see ten times this level.  To support huge fluctuations in our capacity needs, we run a nightly load test against a single production class machine.  The load test itself uses JMeter, a copy of production access logs, and a copy of the production index.  The load testing machine is subjected to 130qps and delivers an average latency of 150ms.
  
+ == Danish Web Archive ==
+ [[http://netarkivet.dk/|Netarkivet]] is the national Danish Web Archive with 500TB+ harvested web resources. We are using Tika to index this in Solr 4.8 in shards of 900GB / 300M documents. A single 24 core 256GB CentOS machine builds the shards, which takes about 8 days each. Nearly all the CPU power is used on the Tika processes, with the Solr indexer easily keeping up. Each shard is optimized down to a single segment. The machine has ~170GB free memory and would likely work just as well with a total memory of 128GB or less. The Solr indexer has 32GB heap, needed for the final optimization step. Currently (2014-09-19) there are 17 finished shards for a total of 15TB / 5 billion documents.
+ 
+ Search is handled by a single dedicated 16 core CentOS machine with 256GB RAM (currently ~130GB free for disk cache). Each shard has its own Solr instance running in Tomcat with a Xmx of 9GB and resides on a dedicated SSD (Samsung 840). All instances are in a single SolrCloud. There are 25 SSDs in the machine and it is currently undecided if the setup will be scaled up (more RAM, more SSDs) or out (another similar machine), when we reach that number of shards.
+ 
+ Access to the library is limited and at most 2 or 3 users are active at a time. Searches are faceted on 6 fields: URL (nearly 5 billion unique values), domain, host and 3 smaller ones. With Solr 4.8 and [[https://issues.apache.org/jira/browse/SOLR-5894|SOLR-5894]], median and average response times during load testing are currently below ½ second and expected to stay below 1 second as the index grows. Special searches, such as *:*, take up to 2 minutes. For non-faceted searches, IOWait get as high as 10%. For faceted searches, IOWait stays below 0.5% and CPU-load is high. It has not been determined if the high CPU-load is due to processing (easily scalable) or memory access congestion (not easily scalable). See [[http://sbdevel.wordpress.com/2014/09/11/even-sparse-faceting-is-limited/|Even sparse faceting is limited]] for most recent performance figures.
+