You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by S G <sg...@gmail.com> on 2018/06/15 22:24:35 UTC

Remove schema.xml in favor of managed-schema

Hi,

As per
https://lucene.apache.org/solr/guide/7_2/schema-factory-definition-in-solrconfig.html#SchemaFactoryDefinitioninSolrConfig-Classicschema.xml,
the only difference between schema.xml and managed-schema is that one
accepts schema-changes through an API while the other one does not.

However, there is a flag "mutable" which can be used with managed-schema
too to turn dynamic-changes ON or OFF

If that is true, then it means schema.xml does not offer anything which
managed-schema does not.

Is that a valid statement to make?

Infact, I see that schema.xml is not shipped anymore with Solr ?

Thanks
SG

RE: Remove schema.xml in favor of managed-schema

Posted by "Davis, Daniel (NIH/NLM) [C]" <da...@nih.gov>.
Elastic allows the mappings to be set all at once, either in the template or as index settings.  That is an important feature because it allows the field definitions to be source code artifacts, which can be deployed very easily by an automatic script.

Solr's Managed Schema API allows multiple changes to be combined into a single POST, but the API changes are not declarative - they modify the current schema rather than setting it.  It would be better if there were an API in the managed schema API to declaratively set the schema field defs, fields, dynamic fields, and copy fields through a single API call.  This would replace the current function of schema.xml

Since that mechanism does not yet exist, I think it is too soon to eliminate schema.xml.

This function of setting it declaratively to exactly what you want is also met by using an uploaded configset, and since solrconfig.xml isn't going away, then this step is not eliminated, and so it seems that an additional step to reliable deployment is introduced.

That said, as long as there is a strong idea of the baseline schema, achieving the desired schema via add, remove, and replace operations is reasonable.

> -----Original Message-----
> From: Doug Turnbull [mailto:dturnbull@opensourceconnections.com]
> Sent: Tuesday, June 19, 2018 12:06 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Remove schema.xml in favor of managed-schema
> 
> I actually prefer the classic config-files approach over managed schemas.
> Having done both Elasticsearch (where everything is configed through an
> API), managed and non-managed Solr, I prefer the legacy non-managed Solr
> way of doing things when its possible
> 
> - With 'managed' approaches, the config code often turns into spaghetti
> throughout the client application, and harder to maintain
> - The client application is often done in any number of programming
> languages, client APIs, etc which makes it harder to ramp up new Solr devs
> on how the search engine works
> - The file-based config can be versioned and deployed as an artifact that
> only contains config bits relevant to the search engine
> 
> I know there's a lot of 'it depends'. For example, if I am programatically
> changing config in real-time without wanting to restart the search engine,
> then I can see the benefit to the managed config. Especially a large,
> complex deployment. But most Solr instances I see are not in the giant,
> complex to config variety and the config file approach is simplest for most
> teams.
> 
> At least that's my 2 cents :)
> -Doug
> 
> 
> On Tue, Jun 19, 2018 at 11:58 AM Alexandre Rafalovitch
> <ar...@gmail.com>
> wrote:
> 
> > And that managed-schema will reorder the entries and delete the
> comments on
> > first API modification.
> >
> > Regards,
> >     Alex
> >
> > On Tue, Jun 19, 2018, 11:47 AM Shawn Heisey, <ap...@elyograg.org>
> wrote:
> >
> > > On 6/17/2018 6:48 PM, S G wrote:
> > > > I only wanted to know if schema.xml offer anything that managed-
> schema
> > > does
> > > > not.
> > >
> > > The only difference between the two is that there is a different
> > > filename and the managed version can be modified by API calls.  The
> > > schema format and what you can do within that format is identical either
> > > way.
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
> >
> --
> CTO, OpenSource Connections
> Author, Relevant Search
> http://o19s.com/doug

Re: Remove schema.xml in favor of managed-schema

Posted by S G <sg...@gmail.com>.
"And that managed-schema will reorder the entries and delete the comments
on first API modification." - This is something very irritating when
comparing files with the default version of Solr to see what has changed.
When upgrading schemas/configs for new version of Solr, such automatically
removed comments are a giant pain to work with.
This does not mean that managed-schema is less useful but Solr should try
to preserve the comments and formatting etc when adding content through
schema APIs



On Wed, Jun 20, 2018 at 4:35 PM Walter Underwood <wu...@wunderwood.org>
wrote:

> I strongly prefer the classic config files approach. Our config files are
> checked into
> version control. We update on the fly by uploading new files to Zookeeper,
> then
> reloading the collection. No restart needed.
>
> Pushing changes to prod is straightforward. Check out the tested files,
> load them
> into the prod cluster, reload the collection.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Jun 19, 2018, at 9:06 AM, Doug Turnbull <
> dturnbull@opensourceconnections.com> wrote:
> >
> > I actually prefer the classic config-files approach over managed schemas.
> > Having done both Elasticsearch (where everything is configed through an
> > API), managed and non-managed Solr, I prefer the legacy non-managed Solr
> > way of doing things when its possible
> >
> > - With 'managed' approaches, the config code often turns into spaghetti
> > throughout the client application, and harder to maintain
> > - The client application is often done in any number of programming
> > languages, client APIs, etc which makes it harder to ramp up new Solr
> devs
> > on how the search engine works
> > - The file-based config can be versioned and deployed as an artifact that
> > only contains config bits relevant to the search engine
> >
> > I know there's a lot of 'it depends'. For example, if I am
> programatically
> > changing config in real-time without wanting to restart the search
> engine,
> > then I can see the benefit to the managed config. Especially a large,
> > complex deployment. But most Solr instances I see are not in the giant,
> > complex to config variety and the config file approach is simplest for
> most
> > teams.
> >
> > At least that's my 2 cents :)
> > -Doug
> >
> >
> > On Tue, Jun 19, 2018 at 11:58 AM Alexandre Rafalovitch <
> arafalov@gmail.com>
> > wrote:
> >
> >> And that managed-schema will reorder the entries and delete the
> comments on
> >> first API modification.
> >>
> >> Regards,
> >>    Alex
> >>
> >> On Tue, Jun 19, 2018, 11:47 AM Shawn Heisey, <ap...@elyograg.org>
> wrote:
> >>
> >>> On 6/17/2018 6:48 PM, S G wrote:
> >>>> I only wanted to know if schema.xml offer anything that managed-schema
> >>> does
> >>>> not.
> >>>
> >>> The only difference between the two is that there is a different
> >>> filename and the managed version can be modified by API calls.  The
> >>> schema format and what you can do within that format is identical
> either
> >>> way.
> >>>
> >>> Thanks,
> >>> Shawn
> >>>
> >>>
> >>
> > --
> > CTO, OpenSource Connections
> > Author, Relevant Search
> > http://o19s.com/doug
>
>

Re: Remove schema.xml in favor of managed-schema

Posted by Walter Underwood <wu...@wunderwood.org>.
I strongly prefer the classic config files approach. Our config files are checked into
version control. We update on the fly by uploading new files to Zookeeper, then 
reloading the collection. No restart needed.

Pushing changes to prod is straightforward. Check out the tested files, load them
into the prod cluster, reload the collection. 

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jun 19, 2018, at 9:06 AM, Doug Turnbull <dt...@opensourceconnections.com> wrote:
> 
> I actually prefer the classic config-files approach over managed schemas.
> Having done both Elasticsearch (where everything is configed through an
> API), managed and non-managed Solr, I prefer the legacy non-managed Solr
> way of doing things when its possible
> 
> - With 'managed' approaches, the config code often turns into spaghetti
> throughout the client application, and harder to maintain
> - The client application is often done in any number of programming
> languages, client APIs, etc which makes it harder to ramp up new Solr devs
> on how the search engine works
> - The file-based config can be versioned and deployed as an artifact that
> only contains config bits relevant to the search engine
> 
> I know there's a lot of 'it depends'. For example, if I am programatically
> changing config in real-time without wanting to restart the search engine,
> then I can see the benefit to the managed config. Especially a large,
> complex deployment. But most Solr instances I see are not in the giant,
> complex to config variety and the config file approach is simplest for most
> teams.
> 
> At least that's my 2 cents :)
> -Doug
> 
> 
> On Tue, Jun 19, 2018 at 11:58 AM Alexandre Rafalovitch <ar...@gmail.com>
> wrote:
> 
>> And that managed-schema will reorder the entries and delete the comments on
>> first API modification.
>> 
>> Regards,
>>    Alex
>> 
>> On Tue, Jun 19, 2018, 11:47 AM Shawn Heisey, <ap...@elyograg.org> wrote:
>> 
>>> On 6/17/2018 6:48 PM, S G wrote:
>>>> I only wanted to know if schema.xml offer anything that managed-schema
>>> does
>>>> not.
>>> 
>>> The only difference between the two is that there is a different
>>> filename and the managed version can be modified by API calls.  The
>>> schema format and what you can do within that format is identical either
>>> way.
>>> 
>>> Thanks,
>>> Shawn
>>> 
>>> 
>> 
> -- 
> CTO, OpenSource Connections
> Author, Relevant Search
> http://o19s.com/doug


Re: Remove schema.xml in favor of managed-schema

Posted by Doug Turnbull <dt...@opensourceconnections.com>.
I actually prefer the classic config-files approach over managed schemas.
Having done both Elasticsearch (where everything is configed through an
API), managed and non-managed Solr, I prefer the legacy non-managed Solr
way of doing things when its possible

- With 'managed' approaches, the config code often turns into spaghetti
throughout the client application, and harder to maintain
- The client application is often done in any number of programming
languages, client APIs, etc which makes it harder to ramp up new Solr devs
on how the search engine works
- The file-based config can be versioned and deployed as an artifact that
only contains config bits relevant to the search engine

I know there's a lot of 'it depends'. For example, if I am programatically
changing config in real-time without wanting to restart the search engine,
then I can see the benefit to the managed config. Especially a large,
complex deployment. But most Solr instances I see are not in the giant,
complex to config variety and the config file approach is simplest for most
teams.

At least that's my 2 cents :)
-Doug


On Tue, Jun 19, 2018 at 11:58 AM Alexandre Rafalovitch <ar...@gmail.com>
wrote:

> And that managed-schema will reorder the entries and delete the comments on
> first API modification.
>
> Regards,
>     Alex
>
> On Tue, Jun 19, 2018, 11:47 AM Shawn Heisey, <ap...@elyograg.org> wrote:
>
> > On 6/17/2018 6:48 PM, S G wrote:
> > > I only wanted to know if schema.xml offer anything that managed-schema
> > does
> > > not.
> >
> > The only difference between the two is that there is a different
> > filename and the managed version can be modified by API calls.  The
> > schema format and what you can do within that format is identical either
> > way.
> >
> > Thanks,
> > Shawn
> >
> >
>
-- 
CTO, OpenSource Connections
Author, Relevant Search
http://o19s.com/doug

Re: Remove schema.xml in favor of managed-schema

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
And that managed-schema will reorder the entries and delete the comments on
first API modification.

Regards,
    Alex

On Tue, Jun 19, 2018, 11:47 AM Shawn Heisey, <ap...@elyograg.org> wrote:

> On 6/17/2018 6:48 PM, S G wrote:
> > I only wanted to know if schema.xml offer anything that managed-schema
> does
> > not.
>
> The only difference between the two is that there is a different
> filename and the managed version can be modified by API calls.  The
> schema format and what you can do within that format is identical either
> way.
>
> Thanks,
> Shawn
>
>

Re: Remove schema.xml in favor of managed-schema

Posted by Shawn Heisey <ap...@elyograg.org>.
On 6/17/2018 6:48 PM, S G wrote:
> I only wanted to know if schema.xml offer anything that managed-schema does
> not.

The only difference between the two is that there is a different
filename and the managed version can be modified by API calls.  The
schema format and what you can do within that format is identical either
way.

Thanks,
Shawn


Re: Remove schema.xml in favor of managed-schema

Posted by S G <sg...@gmail.com>.
I think my query got misinterpreted.

I only wanted to know if schema.xml offer anything that managed-schema does
not.

Best,
SG


On Sat, Jun 16, 2018 at 6:45 PM Erick Erickson <er...@gmail.com>
wrote:

> Currently, there are no restrictions on hand-editing config files,
> mutable or not.
>
> The rub is that any of the APIs that modify configs operate on their
> in-memory copy and write that out (both Cloud and stand-alone modes).
>
> So if I start Solr, the nodes have image of the configs at time T.
> Now I hand-edit the file(s) and push then to ZooKeeper say at time T1
> Now I use the API to update them at T2
> At this point, my changes pushed at T1 are lost since the T2 changes
> used the in-memory copies read at time T as a basis for mods.
>
> If I change this even slightly by:
> Start Solr at T
> hand-edit and push at T1 _and reload my collection_
> use the API at T2
> At this point my configs have all the changes T1 and T2 in them since
> the reload re-read the configs.
>
> Ditto if I restart all my Solr instances after T1 but before T2.
>
> That said, how this will change in the future I have no idea. I
> usually hand-edit them but that's a personal problem.
>
> IIRC, at one point, there was one restriction: A mutable schema could
> _not_ be named schema.xml. But whether that's an accurate memory and
> if so whether it's still current I'm not sure about.
>
> And all of _that_ said, hand-editing mutable configs does, indeed,
> violate all sorts of contracts and support may change in the future,
> it's "at your own risk and you better know what you're doing". The
> same could be said for hand-editing the configs in the first place
> though I suppose ;)
>
> Best,
> Erick
>
> On Sat, Jun 16, 2018 at 1:34 PM, Doug Turnbull
> <dt...@opensourceconnections.com> wrote:
> > I'm not sure changing something from mutable -> unmutable means it
> suddenly
> > becomes hand-editable.
> >
> > I don't know the details here, but I can imagine a case that unmutable
> > implies some level of consistency, where the file is hashed, and later
> > might be confirmed to still be the same 'unmutable' state. Hand editing
> > would violate that contract.
> >
> > One might also imagine a future where 'managed-schema' isn't a config
> file,
> > and is just an API you use to configure a Solr. In this case 'mutable'
> > doesn't imply anything about files, just the state of the Solr config.
> >
> > -Doug
> >
> > On Sat, Jun 16, 2018 at 12:24 AM S G <sg...@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> As per
> >>
> >>
> https://lucene.apache.org/solr/guide/7_2/schema-factory-definition-in-solrconfig.html#SchemaFactoryDefinitioninSolrConfig-Classicschema.xml
> >> ,
> >> the only difference between schema.xml and managed-schema is that one
> >> accepts schema-changes through an API while the other one does not.
> >>
> >> However, there is a flag "mutable" which can be used with managed-schema
> >> too to turn dynamic-changes ON or OFF
> >>
> >> If that is true, then it means schema.xml does not offer anything which
> >> managed-schema does not.
> >>
> >> Is that a valid statement to make?
> >>
> >> Infact, I see that schema.xml is not shipped anymore with Solr ?
> >>
> >> Thanks
> >> SG
> >>
> > --
> > CTO, OpenSource Connections
> > Author, Relevant Search
> > http://o19s.com/doug
>

Re: Remove schema.xml in favor of managed-schema

Posted by Erick Erickson <er...@gmail.com>.
Currently, there are no restrictions on hand-editing config files,
mutable or not.

The rub is that any of the APIs that modify configs operate on their
in-memory copy and write that out (both Cloud and stand-alone modes).

So if I start Solr, the nodes have image of the configs at time T.
Now I hand-edit the file(s) and push then to ZooKeeper say at time T1
Now I use the API to update them at T2
At this point, my changes pushed at T1 are lost since the T2 changes
used the in-memory copies read at time T as a basis for mods.

If I change this even slightly by:
Start Solr at T
hand-edit and push at T1 _and reload my collection_
use the API at T2
At this point my configs have all the changes T1 and T2 in them since
the reload re-read the configs.

Ditto if I restart all my Solr instances after T1 but before T2.

That said, how this will change in the future I have no idea. I
usually hand-edit them but that's a personal problem.

IIRC, at one point, there was one restriction: A mutable schema could
_not_ be named schema.xml. But whether that's an accurate memory and
if so whether it's still current I'm not sure about.

And all of _that_ said, hand-editing mutable configs does, indeed,
violate all sorts of contracts and support may change in the future,
it's "at your own risk and you better know what you're doing". The
same could be said for hand-editing the configs in the first place
though I suppose ;)

Best,
Erick

On Sat, Jun 16, 2018 at 1:34 PM, Doug Turnbull
<dt...@opensourceconnections.com> wrote:
> I'm not sure changing something from mutable -> unmutable means it suddenly
> becomes hand-editable.
>
> I don't know the details here, but I can imagine a case that unmutable
> implies some level of consistency, where the file is hashed, and later
> might be confirmed to still be the same 'unmutable' state. Hand editing
> would violate that contract.
>
> One might also imagine a future where 'managed-schema' isn't a config file,
> and is just an API you use to configure a Solr. In this case 'mutable'
> doesn't imply anything about files, just the state of the Solr config.
>
> -Doug
>
> On Sat, Jun 16, 2018 at 12:24 AM S G <sg...@gmail.com> wrote:
>
>> Hi,
>>
>> As per
>>
>> https://lucene.apache.org/solr/guide/7_2/schema-factory-definition-in-solrconfig.html#SchemaFactoryDefinitioninSolrConfig-Classicschema.xml
>> ,
>> the only difference between schema.xml and managed-schema is that one
>> accepts schema-changes through an API while the other one does not.
>>
>> However, there is a flag "mutable" which can be used with managed-schema
>> too to turn dynamic-changes ON or OFF
>>
>> If that is true, then it means schema.xml does not offer anything which
>> managed-schema does not.
>>
>> Is that a valid statement to make?
>>
>> Infact, I see that schema.xml is not shipped anymore with Solr ?
>>
>> Thanks
>> SG
>>
> --
> CTO, OpenSource Connections
> Author, Relevant Search
> http://o19s.com/doug

Re: Remove schema.xml in favor of managed-schema

Posted by Doug Turnbull <dt...@opensourceconnections.com>.
I'm not sure changing something from mutable -> unmutable means it suddenly
becomes hand-editable.

I don't know the details here, but I can imagine a case that unmutable
implies some level of consistency, where the file is hashed, and later
might be confirmed to still be the same 'unmutable' state. Hand editing
would violate that contract.

One might also imagine a future where 'managed-schema' isn't a config file,
and is just an API you use to configure a Solr. In this case 'mutable'
doesn't imply anything about files, just the state of the Solr config.

-Doug

On Sat, Jun 16, 2018 at 12:24 AM S G <sg...@gmail.com> wrote:

> Hi,
>
> As per
>
> https://lucene.apache.org/solr/guide/7_2/schema-factory-definition-in-solrconfig.html#SchemaFactoryDefinitioninSolrConfig-Classicschema.xml
> ,
> the only difference between schema.xml and managed-schema is that one
> accepts schema-changes through an API while the other one does not.
>
> However, there is a flag "mutable" which can be used with managed-schema
> too to turn dynamic-changes ON or OFF
>
> If that is true, then it means schema.xml does not offer anything which
> managed-schema does not.
>
> Is that a valid statement to make?
>
> Infact, I see that schema.xml is not shipped anymore with Solr ?
>
> Thanks
> SG
>
-- 
CTO, OpenSource Connections
Author, Relevant Search
http://o19s.com/doug