You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Joe Lerner <jo...@gmail.com> on 2018/08/03 15:09:12 UTC

Schema Change for Solr 7.4

We recently set up Solr 7.4 in Production. There are 2 Solr nodes, with 3
zookeepers. We need to make a schema change. What I want to do is simply
push the updated schema to Solr, and then re-index all the content to pick
up the change. But I am being told that I need to:

1.	Delete the collection that depends on this config-set.
2.	Reload the config-set
3.	Recreate the dependent collection

It seems to me that between steps #1 and #3, users will not be able to
search, which is not cool.

Can I avoid the outage to my search capabilitty?

Thanks!

Joe



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Schema Change for Solr 7.4

Posted by Jan Høydahl <ja...@cominvent.com>.
Aliases are like pointers to collections that can be used in-place anywhere you'd use the collection name.
See https://lucene.apache.org/solr/guide/7_4/collections-api.html#createalias <https://lucene.apache.org/solr/guide/7_4/collections-api.html#createalias>

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 13. aug. 2018 kl. 16:46 skrev THADC <ti...@gmail.com>:
> 
> Hi Shawn, thanks for this response. We are probably going to take your
> suggested approach:
> 
> 1. Upload a new configset to ZooKeeper. 
> 2. Create a new collection using the new configset. 
> 3. Index data into the new collection. 
> 4. Set up an alias with the original collection name, pointing at the 
> new collection. 
> 5. When you're sure it's good, delete the old collection. 
> 
> I have a question about step 4. What is the actual mechanism for the
> aliasing? Is the alias something that would be defined in the schema.xml
> file, or are you speaking more generally about something that would be
> crafted in our application code or even like a sym link at the operating
> system level?
> 
> Thanks, Tim
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Schema Change for Solr 7.4

Posted by THADC <ti...@gmail.com>.
Hi Shawn, thanks for this response. We are probably going to take your
suggested approach:

1. Upload a new configset to ZooKeeper. 
2. Create a new collection using the new configset. 
3. Index data into the new collection. 
4. Set up an alias with the original collection name, pointing at the 
new collection. 
5. When you're sure it's good, delete the old collection. 

I have a question about step 4. What is the actual mechanism for the
aliasing? Is the alias something that would be defined in the schema.xml
file, or are you speaking more generally about something that would be
crafted in our application code or even like a sym link at the operating
system level?

Thanks, Tim



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Schema Change for Solr 7.4

Posted by Shawn Heisey <ap...@elyograg.org>.
On 8/3/2018 9:09 AM, Joe Lerner wrote:
> We recently set up Solr 7.4 in Production. There are 2 Solr nodes, with 3
> zookeepers. We need to make a schema change. What I want to do is simply
> push the updated schema to Solr, and then re-index all the content to pick
> up the change. But I am being told that I need to:
>
> 1.	Delete the collection that depends on this config-set.
> 2.	Reload the config-set
> 3.	Recreate the dependent collection
>
> It seems to me that between steps #1 and #3, users will not be able to
> search, which is not cool.

Here's a procedure that should work for most situations:

1. Upload a new configset to ZooKeeper.
2. Create a new collection using the new configset.
3. Index data into the new collection.
4. Set up an alias with the original collection name, pointing at the
new collection.
5. When you're sure it's good, delete the old collection.

Step 4 redirects requests to the original collection name so they end up
on the collection.

========

Whether you need to delete the data before reindexing into the same
collection depends on the precise nature of the change to your schema. 
Some schema changes require not only deleting all data in the
collection, but actually deleting the entire index directory in every
shard replica to remove all traces of the old data.  Can you give
precise details about what change you are planning to the schema?

If you can be absolutely sure that there are no commits happening with
openSearcher set to true and your schema change is safe for the existing
index, then you can use the following procedure.  Note that if anything
goes wrong or the wrong kind of commit occurs during this, your users
will be searching incomplete data:

1. Change the schema.
2. Reload the collection.
3. Delete all documents.
4. Index your data.
5. Issue a commit to make the changes visible.

Thanks,
Shawn


Re: Schema Change for Solr 7.4

Posted by Walter Underwood <wu...@wunderwood.org>.
For an in-place migration:

1. Add new fields to the schema.
2. Reindex to populate those fields.
3. Change queries to use those fields and stop using old fields.
4. Stop sending data to old fields, reindex.
5. Remove old fields from the schema.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Aug 3, 2018, at 8:48 AM, Christopher Schultz <ch...@christopherschultz.net> wrote:
> 
> Joe,
> 
> On 8/3/18 11:44 AM, Joe Lerner wrote:
>> OK--yes, I can see how that would work. But it would require some quick
>> infrastructure flexibility that, at least to this point, we don't really
>> have.
> 
> The only thing that needs swapping is the URL that your application uses
> to connect to Solr, so you don't need anything terribly complicated to
> proxy it.
> 
> Something like Squid would work, and you'd only have a few seconds of
> downtime to set it up initially, and then another few seconds to swap later.
> 
> Heck, you can even remove the proxy after you are all done. It doesn't
> have to be a permanent fixture in your infrastructure.
> 
> -chris
> 


Re: Schema Change for Solr 7.4

Posted by Christopher Schultz <ch...@christopherschultz.net>.
Joe,

On 8/3/18 11:44 AM, Joe Lerner wrote:
> OK--yes, I can see how that would work. But it would require some quick
> infrastructure flexibility that, at least to this point, we don't really
> have.

The only thing that needs swapping is the URL that your application uses
to connect to Solr, so you don't need anything terribly complicated to
proxy it.

Something like Squid would work, and you'd only have a few seconds of
downtime to set it up initially, and then another few seconds to swap later.

Heck, you can even remove the proxy after you are all done. It doesn't
have to be a permanent fixture in your infrastructure.

-chris


Re: Schema Change for Solr 7.4

Posted by Joe Lerner <jo...@gmail.com>.
OK--yes, I can see how that would work. But it would require some quick
infrastructure flexibility that, at least to this point, we don't really
have.

Joe



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Schema Change for Solr 7.4

Posted by Christopher Schultz <ch...@christopherschultz.net>.
Joe,

On 8/3/18 11:09 AM, Joe Lerner wrote:
> We recently set up Solr 7.4 in Production. There are 2 Solr nodes, with 3
> zookeepers. We need to make a schema change. What I want to do is simply
> push the updated schema to Solr, and then re-index all the content to pick
> up the change. But I am being told that I need to:
> 
> 1.	Delete the collection that depends on this config-set.
> 2.	Reload the config-set
> 3.	Recreate the dependent collection
> 
> It seems to me that between steps #1 and #3, users will not be able to
> search, which is not cool.
> 
> Can I avoid the outage to my search capabilitty?

I dunno about how to do any online-updates like this, but you could
always instead:

0. place a proxy between your application and Solr
1. stand-up a new service
2. load the config-set
3. create the collection
4. load all the data from source
5. swap the service at the proxy to the newly-created service

-chris