You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by GitBox <gi...@apache.org> on 2019/03/01 15:08:37 UTC

[GitHub] gerlowskija commented on a change in pull request #594: SOLR-13259: Add new section on Reindexing in Solr

gerlowskija commented on a change in pull request #594: SOLR-13259: Add new section on Reindexing in Solr
URL: https://github.com/apache/lucene-solr/pull/594#discussion_r261634857
 
 

 ##########
 File path: solr/solr-ref-guide/src/reindexing.adoc
 ##########
 @@ -0,0 +1,180 @@
+= Reindexing
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+There are several types of changes to Solr configuration that require you to reindex your data.
+
+These changes include editing properties of fields or field types; adding fields, field types, or copy field rules;
+upgrading Solr; and some system configuration properties.
+
+It's important to be aware that many changes require reindexing, because there are times when not reindexing
+can have negative consequences for Solr as a system, or for the ability of your users to find what they are looking for.
+
+There is no process in Solr for programmatically reindexing data. When we say "reindex", we mean, literally,
+"index it again". However you got the data into the index the first time, you will run that process again.
+
+Reindexing is recommended during upgrades, so in addition to covering what types of configuration changes should trigger a reindex, this section will also cover strategies for reindexing.
+
+== Changes that Require Reindex
+
+=== Schema Changes
+
+All changes to a collection's schema require reindexing. This is because many of the available options are only
+applied during the indexing process. Solr simply has no way to implement the desired change without reindexing
+the data.
+
+To understand the general reason why reindexing is ever required, it's helpful to understand the relationship between
+Solr's schema and the underlying Lucene index. Lucene does not use a schema, it is a Solr-only concept. When you delete
+a field from Solr's schema, it does not modify Lucene's index in any way. When you add a field to Solr's schema, the
+field does not exist in Lucene's index until a document that contains the field is indexed.
+
+This means that there are many types of schema changes that cannot be reflected in the index simply by modifying
+Solr's schema. This is different from most database models where schemas are used. With regard to indexing, Solr's
+schema acts like a rulebook for indexing documents by telling Lucene how to interpret the data being sent. Once the
+documents are in Lucene, Solr's schema has no control over the underlying data structure.
+
+==== Adding or Deleting Fields
+
+If you add or delete a field from Solr's schema, it's strongly recommended to reindex.
+
+When you add a field, you generally do so with the intent to use the field in some way.
+Since documents were indexed before the field was added, the index will not hold any references to the field for earlier documents.
+If you want to use the new field for faceting, for example, the new field facet will not include any documents that were not indexed with the new field.
+
+There is a slightly different situation when deleting a field.
+In this case, since simply removing the field from the schema doesn't change anything about the index, the field will still be in the index until the documents are reindexed.
+In fact, Lucene may keep a reference to a deleted field _forever_ (see also https://issues.apache.org/jira/browse/LUCENE-1761[LUCENE-1761]).
+This may only be an issue for your environment if you try to add a field that has the same name a deleted field,
 
 Review comment:
   "same name as a"

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org