You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Zimmermann, Thomas" <tz...@techtarget.com> on 2018/06/29 21:26:47 UTC

Managed Schemas and Version Control

Hi,

We're transitioning from Solr 4.10 to 7.x and working through our options around managing our schemas. Currently we manage our schema files in a git repository, make changes to the xml files, and then push them out to our zookeeper cluster via the zkcli and the upconfig command like:

/apps/solr/bin/zkcli.sh -cmd upconfig -zkhost host.com:9580 -collection core -confname core -confdir /apps/solr/cores/core/conf/ -solrhome /apps/solr/

This allows us to deploy schema changes without restarting the cluster, while maintaining version control. It looks like we could do the exact same process using Solr 7 and the solr control script like

bin/solr zk upconfig -z 111.222.333.444:2181 -n mynewconfig -d /path/to/configset

Now of course we'd like to improve this process if possible, since manually pushing schema files to the ZK server and reloading the cores is a bit command line intensive. Does anyone has any guidance or experience here leveraging the managed schema api to make updates to a schema in production while maintaining a version controlled copy of the schema. I'd considered using the api to make changes to our schemas, and then saving off the generated schema api to git, or saving off a script that creates the schema file using the managed api to git, but I'm not sure if that is any easier or just adds complexity.

Any thoughts or experience appreciated.

Thanks,
TZ

Re: Managed Schemas and Version Control

Posted by "Zimmermann, Thomas" <tz...@techtarget.com>.
Thanks all! I think we will maintain our current approach of hand editing
the configs in git and implement something at the shell level to automate
the process of running upconfig and performing a core reload.


Re: Managed Schemas and Version Control

Posted by Walter Underwood <wu...@wunderwood.org>.
I wrote a Python program that:

1. Gets a cluster status.
2. Extracts the Zookeeper location from that.
3. Uploads solr.xml and config to Zookeeper (using kazoo library).
4. Sends an async reload command.
5. Polls for success until all the nodes have finished the reload.
6. Optionally rebuilds the suggester.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jun 29, 2018, at 8:15 PM, Erick Erickson <er...@gmail.com> wrote:
> 
> Adding to Shawn's comments.
> 
> You've pretty much nailed all the possibilities, it depends on
> what you're most comfortable with I suppose.
> 
> The only thing I'd add is that you probably have dev and prod
> environments and work out the correct schemas on dev then
> migrate to prod (at least that's what paranoid people like me
> do). At some point you have to graduate to prod when you're
> happy with your dev configs. It shouldn't be much problem to
> script something that
> 
> - pulled the configs from dev
> - pushed them to Git
> - pulled them from Git (sanity check)
> - pushed them to prods ZK
> - reloaded the ZK collection.
> 
> I'd be a little reluctant to script all the managed schema steps,
> too easy to forget to put one of those steps in. I'm picturing
> someone at 3 AM _finally_ getting all the schema figured out and
> forgetting to properly log the managed schema step. In my
> proposal, it wouldn't matter, what you're archiving is the end result
> after you've done all your QA.
> 
> FWIW,
> Erick
> 
> P.S. you'd be surprised how many prod setups I've seen where
> they don't put their configs in VCS. Makes me break out in
> hives so kudos... ;)
> 
> On Fri, Jun 29, 2018 at 5:04 PM, Shawn Heisey <ap...@elyograg.org> wrote:
>> On 6/29/2018 3:26 PM, Zimmermann, Thomas wrote:
>>> We're transitioning from Solr 4.10 to 7.x and working through our options around managing our schemas. Currently we manage our schema files in a git repository, make changes to the xml files,
>> 
>> Hopefully you've got the entire config in version control and not just
>> the schema.
>> 
>>> and then push them out to our zookeeper cluster via the zkcli and the upconfig command like:
>>> 
>>> /apps/solr/bin/zkcli.sh -cmd upconfig -zkhost host.com:9580 -collection core -confname core -confdir /apps/solr/cores/core/conf/ -solrhome /apps/solr/
>> 
>> I don't think the collection parameter is valid for that command.  It
>> would be valid for the linkconfig command, but not for upconfig.  It's
>> probably not hurting anything, though.
>> 
>>> This allows us to deploy schema changes without restarting the cluster, while maintaining version control. It looks like we could do the exact same process using Solr 7 and the solr control script like
>>> 
>>> bin/solr zk upconfig -z 111.222.333.444:2181 -n mynewconfig -d /path/to/configset
>> 
>> Yes, you can do it that way.
>> 
>>> Now of course we'd like to improve this process if possible, since manually pushing schema files to the ZK server and reloading the cores is a bit command line intensive. Does anyone has any guidance or experience here leveraging the managed schema api to make updates to a schema in production while maintaining a version controlled copy of the schema. I'd considered using the api to make changes to our schemas, and then saving off the generated schema api to git, or saving off a script that creates the schema file using the managed api to git, but I'm not sure if that is any easier or just adds complexity.
>> 
>> My preferred method would be manual edits, pushing to git, pushing to
>> zookeeper, and reloading the collection.  I'm comfortable with that
>> method, and don't know much about the schema API.
>> 
>> If you're comfortable with the schema API, you can use that, and then
>> use the "downconfig" command on one one of the ZK scripts included with
>> Solr for pushing to git.
>> 
>> Exactly how to handle the automation would depend on what OS platforms
>> are involved and what sort of tools are accessible to those who will be
>> making the changes.  If it would be on a system accessed with a
>> commandline shell, then a commandline script (perhaps a shell script)
>> seems like the best option.  A script could be created that runs the
>> necessary git commands and then the Solr script to upload the new
>> config, and it could even reload the collection with a tool like curl.
>> 
>> Thanks,
>> Shawn
>> 


Re: Managed Schemas and Version Control

Posted by Erick Erickson <er...@gmail.com>.
Adding to Shawn's comments.

You've pretty much nailed all the possibilities, it depends on
what you're most comfortable with I suppose.

The only thing I'd add is that you probably have dev and prod
environments and work out the correct schemas on dev then
migrate to prod (at least that's what paranoid people like me
do). At some point you have to graduate to prod when you're
happy with your dev configs. It shouldn't be much problem to
script something that

- pulled the configs from dev
- pushed them to Git
- pulled them from Git (sanity check)
- pushed them to prods ZK
- reloaded the ZK collection.

I'd be a little reluctant to script all the managed schema steps,
too easy to forget to put one of those steps in. I'm picturing
someone at 3 AM _finally_ getting all the schema figured out and
forgetting to properly log the managed schema step. In my
proposal, it wouldn't matter, what you're archiving is the end result
after you've done all your QA.

FWIW,
Erick

P.S. you'd be surprised how many prod setups I've seen where
they don't put their configs in VCS. Makes me break out in
hives so kudos... ;)

On Fri, Jun 29, 2018 at 5:04 PM, Shawn Heisey <ap...@elyograg.org> wrote:
> On 6/29/2018 3:26 PM, Zimmermann, Thomas wrote:
>> We're transitioning from Solr 4.10 to 7.x and working through our options around managing our schemas. Currently we manage our schema files in a git repository, make changes to the xml files,
>
> Hopefully you've got the entire config in version control and not just
> the schema.
>
>> and then push them out to our zookeeper cluster via the zkcli and the upconfig command like:
>>
>> /apps/solr/bin/zkcli.sh -cmd upconfig -zkhost host.com:9580 -collection core -confname core -confdir /apps/solr/cores/core/conf/ -solrhome /apps/solr/
>
> I don't think the collection parameter is valid for that command.  It
> would be valid for the linkconfig command, but not for upconfig.  It's
> probably not hurting anything, though.
>
>> This allows us to deploy schema changes without restarting the cluster, while maintaining version control. It looks like we could do the exact same process using Solr 7 and the solr control script like
>>
>> bin/solr zk upconfig -z 111.222.333.444:2181 -n mynewconfig -d /path/to/configset
>
> Yes, you can do it that way.
>
>> Now of course we'd like to improve this process if possible, since manually pushing schema files to the ZK server and reloading the cores is a bit command line intensive. Does anyone has any guidance or experience here leveraging the managed schema api to make updates to a schema in production while maintaining a version controlled copy of the schema. I'd considered using the api to make changes to our schemas, and then saving off the generated schema api to git, or saving off a script that creates the schema file using the managed api to git, but I'm not sure if that is any easier or just adds complexity.
>
> My preferred method would be manual edits, pushing to git, pushing to
> zookeeper, and reloading the collection.  I'm comfortable with that
> method, and don't know much about the schema API.
>
> If you're comfortable with the schema API, you can use that, and then
> use the "downconfig" command on one one of the ZK scripts included with
> Solr for pushing to git.
>
> Exactly how to handle the automation would depend on what OS platforms
> are involved and what sort of tools are accessible to those who will be
> making the changes.  If it would be on a system accessed with a
> commandline shell, then a commandline script (perhaps a shell script)
> seems like the best option.  A script could be created that runs the
> necessary git commands and then the Solr script to upload the new
> config, and it could even reload the collection with a tool like curl.
>
> Thanks,
> Shawn
>

Re: Managed Schemas and Version Control

Posted by Shawn Heisey <ap...@elyograg.org>.
On 6/29/2018 3:26 PM, Zimmermann, Thomas wrote:
> We're transitioning from Solr 4.10 to 7.x and working through our options around managing our schemas. Currently we manage our schema files in a git repository, make changes to the xml files,

Hopefully you've got the entire config in version control and not just
the schema.

> and then push them out to our zookeeper cluster via the zkcli and the upconfig command like:
>
> /apps/solr/bin/zkcli.sh -cmd upconfig -zkhost host.com:9580 -collection core -confname core -confdir /apps/solr/cores/core/conf/ -solrhome /apps/solr/

I don't think the collection parameter is valid for that command.  It
would be valid for the linkconfig command, but not for upconfig.  It's
probably not hurting anything, though.

> This allows us to deploy schema changes without restarting the cluster, while maintaining version control. It looks like we could do the exact same process using Solr 7 and the solr control script like
>
> bin/solr zk upconfig -z 111.222.333.444:2181 -n mynewconfig -d /path/to/configset

Yes, you can do it that way.

> Now of course we'd like to improve this process if possible, since manually pushing schema files to the ZK server and reloading the cores is a bit command line intensive. Does anyone has any guidance or experience here leveraging the managed schema api to make updates to a schema in production while maintaining a version controlled copy of the schema. I'd considered using the api to make changes to our schemas, and then saving off the generated schema api to git, or saving off a script that creates the schema file using the managed api to git, but I'm not sure if that is any easier or just adds complexity.

My preferred method would be manual edits, pushing to git, pushing to
zookeeper, and reloading the collection.  I'm comfortable with that
method, and don't know much about the schema API.

If you're comfortable with the schema API, you can use that, and then
use the "downconfig" command on one one of the ZK scripts included with
Solr for pushing to git.

Exactly how to handle the automation would depend on what OS platforms
are involved and what sort of tools are accessible to those who will be
making the changes.  If it would be on a system accessed with a
commandline shell, then a commandline script (perhaps a shell script)
seems like the best option.  A script could be created that runs the
necessary git commands and then the Solr script to upload the new
config, and it could even reload the collection with a tool like curl.

Thanks,
Shawn