You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Erick Erickson (JIRA)" <ji...@apache.org> on 2013/08/29 23:37:52 UTC

[jira] [Commented] (SOLR-4478) Allow cores to specify a named config set

    [ https://issues.apache.org/jira/browse/SOLR-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754089#comment-13754089 ] 

Erick Erickson commented on SOLR-4478:
--------------------------------------

I got to thinking about this and trying to take it out of mothballs and I'm starting to think it's a terrible idea for 4.x and should be postponed or abandoned unless and until we do something like what has been discussed elsewhere; having there be "one source of truth" (ZooKeeper has been discussed for instance). So I'll list out the issues I've thought about and if there are straightforward answers to them I'll be happy to reconsider.

Each issue is probably technically do-able, but the sum (and ones I haven't seen yet) totally scare me.

1> Traditional master/slave architectures. Let's say we change the schema (it'd have to be on the master?). How to get that to the slaves? Currently the confFiles directive has an explicit test and will not copy a directory. I'm not convinced it'd even work with relative paths and listing _every_ file in the configset dir would be kludgy at best. And I think the confFiles directive doesn't work outside the "conf" directory for the core it's replicating anyway. I suppose the user could copy the configset directory to all the nodes in the farm, but....

2> The new REST API for modifying the schema. In non-SolrCloud mode, how does that work? Is it only allowed on the master (assuming we can solve <1>)? How to enforce?

3> Sharing the solrConfig object is also fraught with issues as discussed above. There's already the "share schema" option, so at least it's possible to have one shared schema.

4> How to get any changes reloaded in a master/slave environment for all the affected cores on all the machines? You'd need some kind of manual process of going to each one and issuing a new command "ReloadAllCores" or build in some kind of notification system. Or we'd need to require the user to keep a list of all the nodes and all the cores and script reloading them all. Nobody should be re-inventing ZooKeeper.

5> How to get any changes reloaded in even the non master/slave environment for all the affected cores? A new command? Periodic polling? Check every query/update request?

6> Sticky wickets I haven't thought of yet, I'm afraid, very afraid... Each of these is solvable, but considering the effort involved it doesn't seem like it's worth pursuing right now, at least my interest is disappearing.

And wrapped around this is that SolrCloud already handles most of the things I'm worried about, especially getting changes propagated to all the right places in the cluster. SolrCloud already has a way to reload all the nodes that take part in a collection. SolrCloud already has the notifications of changes to the config set built in (at least I think, if not it will). 

My feeling at this point is that supporting this well would turn into a huge amount of work _that would then be thrown away_ if we go to a "one source of truth" model in Solr5 (or even 6). And that actually _using_ the capability would be fragile and complex. So unless I can be convinced otherwise, I'm going to assign this back to nobody and forget about it.

                
> Allow cores to specify a named config set
> -----------------------------------------
>
>                 Key: SOLR-4478
>                 URL: https://issues.apache.org/jira/browse/SOLR-4478
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 4.2, 5.0
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>         Attachments: SOLR-4478.patch, SOLR-4478.patch
>
>
> Part of moving forward to "the new way", after SOLR-4196 etc... I propose an additional parameter specified on the <core> node in solr.xml or as a parameter in the "discovery" mode core.properties file, call it configSet, where the value provided is a path to a directory, either absolute or relative. Really, this is as though you copied the conf directory somewhere to be used by more than one core.
> Straw-man: There will be a directory <solr_home>/configsets which will be the default. If the configSet parameter is, say, "myconf", then I'd expect a directory named "myconf" to exist in <solr_home>/configsets, which would look something like
> <solr_home>/configsets/myconf/schema.xml
>                               solrconfig.xml
>                               stopwords.txt
>                               velocity
>                               velocity/query.vm
> etc.
> If multiple cores used the same configSet, schema, solrconfig etc. would all be shared (i.e. shareSchema="true" would be assumed). I don't see a good use-case for _not_ sharing schemas, so I don't propose to allow this to be turned off. Hmmm, what if shareSchema is explicitly set to false in the solr.xml or properties file? I'd guess it should be honored but maybe log a warning?
> Mostly I'm putting this up for comments. I know that there are already thoughts about how this all should work floating around, so before I start any work on this I thought I'd at least get an idea of whether this is the way people are thinking about going.
> Configset can be either a relative or absolute path, if relative it's assumed to be relative to <solr_home>.
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org