You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2013/04/01 06:13:15 UTC
[jira] [Commented] (SOLR-4658) In preparation for dynamic schema modification via REST API, add a "managed" schema facility

    [ https://issues.apache.org/jira/browse/SOLR-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13618599#comment-13618599 ] 

Robert Muir commented on SOLR-4658:
-----------------------------------

It seems a little wierd to tie in all this zookeeper etc stuff into indexschema, and i'm still trying to figure out the mutable/managed stuff. 

If the goal is to have multiple implementations of indexschema (immutable ones backed by human edited files, mutable ones saved to some opaque "database" that can be edited by REST), then why not make IndexSchema abstract and pluggable from solrconfig.xml like anything else?

                
> In preparation for dynamic schema modification via REST API, add a "managed" schema facility
> --------------------------------------------------------------------------------------------
>
>                 Key: SOLR-4658
>                 URL: https://issues.apache.org/jira/browse/SOLR-4658
>             Project: Solr
>          Issue Type: Sub-task
>          Components: Schema and Analysis
>            Reporter: Steve Rowe
>            Assignee: Steve Rowe
>            Priority: Minor
>             Fix For: 4.3
>
>         Attachments: SOLR-4658.patch
>
>
> The idea is to have a set of configuration items in {{solrconfig.xml}}:
> {code:xml}
> <schema managed="true" mutable="true" managedSchemaResourceName="managed-schema"/>
> {code} 
> It will be a precondition for future dynamic schema modification APIs that {{mutable="true"}}.  {{solrconfig.xml}} parsing will fail if {{mutable="true"}} but {{managed="false"}}.
> When {{managed="true"}}, and the resource named in {{managedSchemaResourceName}} doesn't exist, Solr will automatically upgrade the schema to "managed": the non-managed schema resource (typically {{schema.xml}}) is parsed and then persisted at {{managedSchemaResourceName}} under {{$solrHome/$collectionOrCore/conf/}}, or on ZooKeeper at {{/configs/$configName/}}, and the non-managed schema resource is renamed by appending {{.bak}}, e.g. {{schema.xml.bak}}.
> Once the upgrade has taken place, users can get the full schema from the {{/schema?wt=schema.xml}} REST API, and can use this as the basis for modifications which can then be used to manually downgrade back to non-managed schema: put the {{schema.xml}} in place, then add {{<schema managed="false"/>}} to {{solrconfig.xml}} (or remove the whole {{<schema/>}} element, since {{managed="false"}} is the default).
> If users take no action, then Solr behaves the same as always: the example {{solrconfig.xml}} will include {{<schema managed="false" ...>}}.
> For a discussion of rationale for this feature, see [~hossman_lucene@fucit.org]'s post to the solr-user mailing list in the thread "Dynamic schema design: feedback requested" [http://markmail.org/message/76zj24dru2gkop7b]:
>  
> {quote}
> Ignoring for a moment what format is used to persist schema information, I 
> think it's important to have a conceptual distinction between "data" that 
> is managed by applications and manipulated by a REST API, and "config" 
> that is managed by the user and loaded by solr on init -- or via an 
> explicit "reload config" REST API.
> Past experience with how users percieve(d) solr.xml has heavily reinforced 
> this opinion: on one hand, it's a place users must specify some config 
> information -- so people wnat to be able to keep it in version control 
> with other config files.  On the other hand it's a "live" data file that 
> is rewritten by solr when cores are added.  (God help you if you want do a 
> rolling deploy a new version of solr.xml where you've edited some of the 
> config values while simultenously clients are creating new SolrCores)
> As we move forward towards having REST APIs that treat schema information 
> as "data" that can be manipulated, I anticipate the same types of 
> confusion, missunderstanding, and grumblings if we try to use the same 
> pattern of treating the existing schema.xml (or some new schema.json) as a 
> hybrid configs & data file.  "Edit it by hand if you want, the /schema/* 
> REST API will too!"  ... Even assuming we don't make any of the same 
> technical mistakes that have caused problems with solr.xml round tripping 
> in hte past (ie: losing comments, reading new config options that we 
> forget to write back out, etc...) i'm fairly certain there is still going 
> to be a lot of things that will loook weird and confusing to people.
> (XML may bave been designed to be both "human readable & writable" and 
> "machine readable & writable", but practically speaking it's hard have a 
> single XML file be "machine and human readable & writable")
> I think it would make a lot of sense -- not just in terms of 
> implementation but also for end user clarity -- to have some simple, 
> straightforward to understand caveats about maintaining schema 
> information...
> 1) If you want to keep schema information in an authoritative config file 
> that you can manually edit, then the /schema REST API will be read only. 
> 2) If you wish to use the /schema REST API for read and write operations, 
> then schema information will be persisted under the covers in a data store 
> whose format is an implementation detail just like the index file format.
> 3) If you are using a schema config file and you wish to switch to using 
> the /schema REST API for managing schema information, there is a 
> tool/command/API you can run to so.
> 4) if you are using the /schema REST API for managing schema information, 
> and you wish to switch to using a schema config file, there is a 
> tool/command/API you can run to export the schema info if a config file 
> format.
> ...wether of not the "under the covers in a data store" used by the REST 
> API is JSON, or some binary data, or an XML file just schema.xml w/o 
> whitespace/comments should be an implementation detail.  Likewise is the 
> question of wether some new config file formats are added -- it shouldn't 
> matter.
> If it's config it's config and the user owns it.
> If it's data it's data and the system owns it.
> : is the risk they take if they want to manually edit it - it's no 
> : different than today when you edit the file and do a Core reload or 
> : something. I think we can improve some validation stuff around that, but 
> : it doesn't seem like a show stopper to me.
> The new risk is multiple "actors" (both the user, and Solr) editing the 
> file concurrently, and info that might be lost due to Solr reading the 
> file, manpulating internal state, and then writing the file back out.  
> Eg: User hand edits may be lost if they happen on disk during Solr's 
> internal manpulation of data.  API edits may be reflected in the internal 
> state, but lost if the User writes the file directly and then does a core 
> reload, etc....
> : At a minimum, I think the user should be able to start with a hand 
> : modified file. Many people *heavily* modify the example schema to fit 
> : their use case. If you have to start doing that by making 50 rest API 
> : calls, that's pretty rough. Once you get your schema nice and happy, you 
> : might script out those rest calls, but initially, it's much 
> : faster/easier to whack the schema into place in a text editor IMO.
> I don't think there is any disagreement about that.  The ability to say 
> "my schema is a config file and i own it" should always exist (remove 
> it over my dead body) 
> The question is what trade offs to expect/require for people who would 
> rather use an API to manipulate these things -- i don't think it's 
> unreasable to say "if you would like to manipulate the schema using an 
> API, then you give up the ability to manipulate it as a config file on 
> disk"
> ("if you want the /schema API to drive your car, you have to take your 
> foot of hte pedals and let go of the steering wheel")
> {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org