You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Timothy Potter (JIRA)" <ji...@apache.org> on 2014/02/11 01:56:25 UTC

[jira] [Updated] (SOLR-5653) Create a RESTManager to provide REST API endpoints for reconfigurable plugins

     [ https://issues.apache.org/jira/browse/SOLR-5653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Timothy Potter updated SOLR-5653:
---------------------------------

    Attachment: SOLR-5653.patch

Here is the first attempt at a solution for the RestManager and implementations for managing stop words and synonyms via a REST API.

A few things to notice about this implementation:

1) The RestManager needs to be able to read/write data from/to ZooKeeper if in cloud mode or the local FS if in standalone mode. This is the purpose of the ManagedResourceStorage.StorageIO interface. The idea here is that the RestManager receives the StorageIO in its constructor from the SolrCore during initialization. Currently, this is done in the SolrCore object, which has to do an instanceof on the SolrResourceLoader to determine if it is ZK aware. This is a bit hacky but I didn't see a better way to determine if a core is running in ZK mode from within the SolrCore object. Currently, I provide 3 implementations of StorageIO: ZooKeeperStorageIO, FileStorageIO, and InMemoryStorageIO.

2) A ManagedResource should be able to choose its own storage format, with the storageIO being determined by the container.
This gives the ManagedResource developer flexibility in how they store data without having to fuss with knowing how load/store bytes to ZK or local FS. Currently, the only provided storage format is JSON, see: ManagedResourceStorage.JsonStorage.

3) I'm using a "registry" object that is available from the SolrResourceLoader to capture Solr components that declare themselves as being "managed". This is needed because parsing the solrconfig.xml may encounter managed components before it parses and initializes the RestManager. Basically, I wanted to separate the registration of managed components from the initialization of the RestManager and those components as I didn't want to force the position of the <restManager/> element in the solrconfig.xml to be before all other components.

4) The design is based around the concept that there may be many different Solr components that share a single ManagedResource. For instance, there may be many ManagedStopFilterFactory instances declared in schema.xml that share a common set of managed English stop words. Thus, I'm using the "observer" pattern which allows Solr components to register as an observer of a shared ManagedResource. This way we don't end up with 10 different managers of the same stop word list.

5) ManagedResourceObserver instances are notified once during core initialization (load or reload) when the managed data is available. This is their signal to internalize the managed data, such as the ManagedStopFilterFactory converting the managed set of terms into a CharArraySet used for creating StopFilters. This is a critical part of the design in that updates to the managed data are not applied until a core is reloaded. This is to avoid having analysis components with different views of managed data, i.e. we don't want some of the replicas for a shard to have a different set of stop words than the other shards.

6) I've provided one concrete ManagedResource implementation for managing a word set, which is useful for stop words and protected words (KeywordMarkerFilter). This implementation shows how to handle initArgs and a managedList of words.

Known Issues:

a. The current RestManager attaches its registered endpoints using SolrRestApi, which is configured to process requests to /collection/schema. While this path works for stop words and synonyms, it doesn't work in the general case of any type of ManagedResource. We need to figure out a better path for which to configure the RestManager, but re-working that should be minor.

b. I had to make a few things public in the BaseSchemaResource class and extended the RestManager.ManagedEndpoint class from it. We should refactor BaseSchemaResource into a BaseSolrResource as it has usefulness beyond schema related resources.

c. Deletes - the ManagedResource framework supports deletes but I wasn't sure how to enable them in Restlet; again probably a minor issue in the restlet config / setup.

> Create a RESTManager to provide REST API endpoints for reconfigurable plugins
> -----------------------------------------------------------------------------
>
>                 Key: SOLR-5653
>                 URL: https://issues.apache.org/jira/browse/SOLR-5653
>             Project: Solr
>          Issue Type: Sub-task
>            Reporter: Steve Rowe
>         Attachments: SOLR-5653.patch
>
>
> It should be possible to reconfigure Solr plugins' resources and init params without directly editing the serialized schema or {{solrconfig.xml}} (see Hoss's arguments about this in the context of the schema, which also apply to {{solrconfig.xml}}, in the description of SOLR-4658)
> The RESTManager should allow plugins declared in either the schema or in {{solrconfig.xml}} to register one or more REST endpoints, one endpoint per reconfigurable resource, including init params.  To allow for multiple plugin instances, registering plugins will need to provide a handle of some form to distinguish the instances.
> This RESTManager should also be able to create new instances of plugins that it has been configured to allow.  The RESTManager will need its own serialized configuration to remember these plugin declarations.
> Example endpoints:
> * SynonymFilterFactory
> ** init params: {{/solr/collection1/config/syns/myinstance/options}}
> ** synonyms resource: {{/solr/collection1/config/syns/myinstance/synonyms-list}}
> * "/select" request handler
> ** init params: {{/solr/collection1/config/requestHandlers/select/options}}
> We should aim for full CRUD over init params and structured resources.  The plugins will bear responsibility for handling resource modification requests, though we should provide utility methods to make this easy.
> However, since we won't be directly modifying the serialized schema and {{solrconfig.xml}}, anything configured in those two places can't be invalidated by configuration serialized elsewhere.  As a result, it won't be possible to remove plugins declared in the serialized schema or {{solrconfig.xml}}.  Similarly, any init params declared in either place won't be modifiable.  Instead, there should be some form of init param that declares that the plugin is reconfigurable, maybe using something like "managed" - note that request handlers already provide a "handle" - the request handler name - and so don't need that to be separately specified:
> {code:xml}
> <requestHandler name="/select" class="solr.SearchHandler">
>    <managed/>
> </requestHandler>
> {code}
> and in the serialized schema - a handle needs to be specified here:
> {code:xml}
> <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
> ...
>   <analyzer type="query">
>     <tokenizer class="solr.StandardTokenizerFactory"/>
>     <filter class="solr.SynonymFilterFactory" managed="english-synonyms"/>
> ...
> {code}
> All of the above examples use the existing plugin factory class names, but we'll have to create new RESTManager-aware classes to handle registration with RESTManager.
> Core/collection reloading should not be performed automatically when a REST API call is made to one of these RESTManager-mediated REST endpoints, since for batched config modifications, that could take way too long.  But maybe reloading could be a query parameter to these REST API calls. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org