You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Ryan Wilson <rw...@leadsonline.com> on 2011/07/06 20:40:42 UTC

Improving Indexing Performance for a 200 million+ record index

Hello all,



A colleague and I currently have a working Solr 3.2 installation in which we use DataImporter to allow for indexing from our 200+ million record database. The indexer is hosted on an internally hosted virtual instance with 40Gb of ram and 2 cores assigned to it. At one point we had a 12 hour full indexing time, but after making a some changes to schema that were mentioned in an article (http://www.lucidimagination.com/blog/2010/01/21/the-seven-deadly-sins-of-solr/) we went through our schema and determined from which fields we could remove term vectors, positions, etc. We now have an indexing time of 16 hours or longer. We have rolled back our schema file to get our previous indexing times back but are left curious as to how steps that should have reduced indexing times instead seemed to increase it. I have attached our schema file pre and post changes that we made. Any information would be appreciated.



Kind regards,

Ryan Wilson