You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2010/02/20 02:11:11 UTC

[Solr Wiki] Update of "SolrPerformanceData" by TomBurtonWest

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "SolrPerformanceData" page has been changed by TomBurtonWest.
http://wiki.apache.org/solr/SolrPerformanceData?action=diff&rev1=14&rev2=15

--------------------------------------------------

  
  == HathiTrust Large Scale Solr Benchmarking ==
  
+ [[http://www.hathitrust.org|HathiTrust]] ''makes the digitized collections of some of the nation’s great research libraries available for all.''  We currently have slightly over 5 million full-text books indexed.  Our production index is spread across 10 shards on 4 machines. With a total index size of over 2 Terabytes, our biggest bottleneck is disk I/O.  We did reduce that significantly using CommonGrams, but disk I/O is still the bottleneck for performance. 
- [[http://www.hathitrust.org|HathiTrust]] ''makes the digitized collections of some of the nation’s great research libraries available for all.''  We are planning to index 20 million full-text books in Solr. 
- Our current index for 1 million full text books is about 225GB and we are getting average response times of about 1/2 a second, but the 0.5% slowest queries are taking between 10 seconds and 2 minutes.  We are working on strategies to improve overall response time.
  
+ On our production index, the average Solr response time is around 200 ms, median response time 90 ms, 90th percentile about 450 ms, and 99th percentile about 1.4 seconds.  Details on the hardware are available at
+ [[http://www.hathitrust.org/blogs/large-scale-search/new-hardware-searching-5-million-volumes-full-text|New hardware for searching 5 million plus volumes]]  Some details on performance are available at: [[http://www.hathitrust.org/blogs/large-scale-search/performance-5-million-volumes|Performance at 5 million volumes]].  Background and updates available at:[[http://www.hathitrust.org/blogs/large-scale-search|The HathiTrust Large Scale Search blog]]  
- Our benchmarking efforts to date are reported in 
-  * [[http://www.hathitrust.org/large_scale_search|The HathiTrust Large Scale Search page]]
-  * [[http://www.hathitrust.org/technical_reports/Large-Scale-Search.pdf|Technical Report on Large Scale Search Benchmarking (pdf)]]
-  * [[http://www.hathitrust.org/blogs/large-scale-search|updates (including hardware information)]]
-  * [[http://www.hathitrust.org/documents/HathiTrust-DLFForum-200905.ppt|part of a panel presentation at the DLF (powerpoint)]]