You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Timothy Potter (JIRA)" <ji...@apache.org> on 2015/07/27 21:12:04 UTC

[jira] [Commented] (SOLR-5606) REST based Collections API

    [ https://issues.apache.org/jira/browse/SOLR-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14643229#comment-14643229 ] 

Timothy Potter commented on SOLR-5606:
--------------------------------------

I think striving for a REST solution is not practical at this point with the maturing of the bulk-style Schema and Configs APIs. Rather than continuing to argue about REST (see discussion on SOLR-7312), we should embrace this bulk approach and try to be consistent. In other words, having the same feel across all admin APIs seems more productive than having a REST-based Collection API and a bulk-style Schema / Config API. Specifically, here are some additional ideas I have for this effort:

1) Adapt the current collections API to use the bulk-style API used by Schema and Config API at the {{/solr/<collection>/admin}} endpoint. For instance, to add a replica to shard1 of the *gettingstarted* collection, I would do:

{code}
curl -X POST -H 'Content-type:application/json' --data-binary '{
  "add-replica":{
     "shard":"shard1",
     "node":"192.168.0.1_solr"}
}' http://localhost:8983/solr/gettingstarted/admin
{code}

The other collection admin actions would work similarly. This also has the benefit of allowing multiple collection admin actions to be applied at the same time, such as to add a replica for each shard at the same time.

In general, no operations that change the state of Solr should accept GET requests, see: SOLR-1523

2) Move all of the cluster-level API actions currently in the collections API to a cluster API. Specifically, the following actions should not be in the "collections" API:

/admin/collections?action=CLUSTERPROP: Add/edit/delete a cluster-wide property 
/admin/collections?action=ADDROLE: Add a specific role to a node in the cluster 
/admin/collections?action=REMOVEROLE: Remove an assigned role 
/admin/collections?action=OVERSEERSTATUS: Get status and statistics of the overseer 
/admin/collections?action=CLUSTERSTATUS: Get cluster status 

In addition, we should add a node status API endpoint similar to what is reported by {{bin/solr status}}, i.e.

{code}
Solr process 81705 running on port 7574
{
  "solr_home":"/Users/timpotter/dev/lw/projects/br5x/solr/example/cloud/node2/solr/",
  "version":"5.3.0-SNAPSHOT 1689511 - timpotter - 2015-07-06 16:00:47",
  "startTime":"2015-07-06T22:36:38.322Z",
  "uptime":"0 days, 21 hours, 39 minutes, 50 seconds",
  "memory":"88 MB (%17.9) of 490.7 MB",
  "cloud":{
    "ZooKeeper":"localhost:9983",
    "liveNodes":"2",
    "collections":"3"}}
{code}

NOTE: The JSON returned by the node status API should use a consistent naming style for names; the Schema / Config APIs use a snake-case with dashes vs. camel case. Whichever we chose, it needs to be consistent across all requests / responses returned by Solr.

3) The CLUSTERSTATUS action takes an optional collection / shard parameters, which should be migrated under a specific collection endpoint, such as:

{{/solr/<collection>/status}}

Integrate the healthcheck code in the SolrCLI with the {{/solr/<collection>/status}} action so that the healthcheck is available to all clients and not just from the command-line.

4) Sending a GET request to the {{/solr/<collection>}} endpoint should return 200 (exists) or 404 (not found). The body could also return basic metadata (as JSON) about the specified collection if it exists. This also helps fix the issue of determining if a collection already exists. Currently, users have to either iterate over the list of collections to determine existence or use the CLUSTERSTATUS command with the collection parameter, neither of which are as intuitive as sending a GET request to a collection resource.

Alternatively, rather than having a separate status endpoint for a collection, we could just return the status information for the collection for a GET request to {{/solr/<collection>}}. We can use a query string parameter to allow users to control how much status information should be returned as things like the healthcheck are not free to execute so should only be done when requested. For instance:

GET /solr/<collection> returns 200 or 404
GET /solr/<collection>?status=true returns status information in the response body

5) Ability to filter collections from the API based on the following criteria (similar to what the cloud panel enables in the UI):

{{GET /solr/collections}} returns a list of all collection names

or

{{GET /solr/collections?params}} return a list of collections matching criteria specified in the additional params. 

Filtering criteria could include:
+ name prefix matching (tj*)
+ config name (to show me all the collections that use config xyz)
+ activity level (to show my busiest collections in the past X time range)
+ replica status (to show me all the collections that have replicas that are down | recovering | etc)
+ by node (to show me all the collections that have replicas on a specific node in my cluster)
+ creation date (to show me all the collections created since some date or before some other date)

6) Deleting a collection should use the DELETE HTTP verb, i.e.

{{DELETE /solr/<collection>}}

This makes it easier to secure.

7) Creating a collection needs to be overhauled. Currently, a user sends a GET request to {{/solr/admin/collections?action=CREATE&args …}}. There are several issues with this:

- GET requests should not be used to change the state of the system, should be a PUT or a POST (or both is fine too).
- Long list of query parameters to specify collection parameters
- XML is returned by default using embedded {{<lst>}} elements (confusing)
- collection.configName parameter: numerous issues
- response (besides being XML) makes no sense to a new user (see below, looks like a bunch of mumbo jumbo to me):
{code}
<?xml version="1.0" encoding="UTF-8"?>
<response>
  <lst name="responseHeader">
    <int name="status">0</int>
    <int name="QTime">1672</int>
  </lst>
  <lst name="success">
    <lst>
      <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">1497</int>
      </lst>
      <str name="core">foo_shard2_replica1</str>
    </lst>
    <lst>
      <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">1557</int>
      </lst>
      <str name="core">foo_shard1_replica1</str>
    </lst>
  </lst>
</response>
{code}

Introduce an approach that will appeal to the REST lovers among us that accepts a JSON definition of a collection where you can POST to {{/solr/collections/}}, such as:

{code}
curl -XPOST -H 'Content-type: application/json' -d '{"name":"golf","numShards":2,"configName":"foo"}' http://localhost:8080/solr/collections/
{code}

POST should be used instead of PUT as PUT requests are intended / expected to be idempotent. At this point, the {{/collections}} endpoint is solely used to handle creation and list/find collection requests. The API should use a sensible default for numShards and replicationFactor as a new user may not really understand these the first time they use Solr, as is done currently by the {{bin/solr create -c}} command. Response is either 201 (created) or an error code and explanation (in JSON)

There are obviously more issues to be dealt with around collection configs, but I'll address those in other ticket. The point here is to clean-up how create works.

7) We need a collection-level metrics API endpoint. SolrCloud doesn't provide any aggregate stats about the cluster or a collection. Very common questions such as document counts per shard, index sizes, request rates etc cannot be answered easily without figuring out the cluster state, invoking multiple core admin APIs and aggregating them manually, see: SOLR-6325





> REST based Collections API
> --------------------------
>
>                 Key: SOLR-5606
>                 URL: https://issues.apache.org/jira/browse/SOLR-5606
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrCloud
>            Reporter: Jan Høydahl
>            Priority: Minor
>             Fix For: Trunk
>
>
> For consistency reasons, the collections API (and other admin APIs) should be REST based. Spinoff from SOLR-1523



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org