You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2015/08/04 19:44:33 UTC

[Solr Wiki] Update of "HowToReindex" by ShawnHeisey

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "HowToReindex" page has been changed by ShawnHeisey:
https://wiki.apache.org/solr/HowToReindex?action=diff&rev1=8&rev2=9

Comment:
Added note about why reindexing is necessary.

  Indexing is something that can be manually done by a person or automatically done by a program, but it is always external to Solr.  There is an issue in the bugtracker for adding dataimport handler scheduling to Solr, but it is meeting with committer resistance, because *ALL* modern operating systems have a scheduling capability built in.  Also, that would mean that Solr can change your index without external action, which is generally considered a bad idea by committers.
  
  Depending on your setup and goals, you may need to delete all documents before you begin your indexing process.  Sometimes it is necessary to delete your index directory entirely before you restart Solr or reload your core.
+ 
+ It's reasonable to wonder why deleting the existing data and building it again is necessary.  Here's why:  When you change your schema, nothing happens to the existing data in the index.  When Solr tries to access the existing data in the index, it uses the schema as a guide to interpreting that data.  If the index contains rows that have a field built with the SortableIntField class and then Solr tries to access that data with a different class (such as TrieIntField), there's a good chance that an unrecoverable error will occur.
  
  == Using Solr as a Data Source ==
  Don't do this unless you have no other option.  Solr is not really designed for this role.  Every attempt is made to ensure that Solr is stable, but indexes do get corrupted by unanticipated situations, and by things completely outside developer control.  Solr 4.x and later does have NoSQL features, and SolrCloud goes a long way towards high availability, but absolute data reliability in the face of any problem is difficult to achieve for any software, which is why it's always important to have backups.