You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Timothy Potter (JIRA)" <ji...@apache.org> on 2014/08/14 19:38:13 UTC

[jira] [Commented] (SOLR-6249) Schema API changes return success before all cores are updated

    [ https://issues.apache.org/jira/browse/SOLR-6249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14097263#comment-14097263 ] 

Timothy Potter commented on SOLR-6249:
--------------------------------------

Going to start working on this ... a few initial thoughts:

ZkIndexSchemaReader has a ZK watcher to receive notification when the schema is updated. So we're talking about a very small window of time between the updated schema being written and all replicas seeing the update. Consequently I think it's reasonable for the core that accepted the API request to block with a reasonable timeout to give all the active replicas time to apply the update. In other words, I don't think we should put the burden on the client to assess the success / failure of the operation across all replicas. I'll need to figure out what to block on but ZooKeeper's two-phase commit recipe sounds applicable here in that the tx coordinator (the core that accepted the API request) can block until all other replicas ack that they've applied the updates successfully. Alternatively, the coordinator could just poll each active replica for the schema version it is using (along the lines of what Gregory suggested above) with a maximum amount of time the coordinator is allowed to keep polling replicas.

I also think a replica that cannot process the update successfully should be put into the down state (so as to prevent it from receiving update/query requests). The replica should initiate this action itself if the update fails so that we don't have replicas with mixed schemas running in the cluster. I'll need to dig into the ramifications of that, but my thinking here is if 1 replica applies the update successfully and another fails, then we probably still want the request to succeed from the client perspective and just put the replica having problems into the down state. I prefer this over the approach where an update must succeed on all "active" replicas or fail entirely, which gets us into the realm of distributed transactions for these types of updates, which is now hard because the write to ZooKeeper has already occurred (requiring compensating transaction type solutions where we'd need to back out the write).

> Schema API changes return success before all cores are updated
> --------------------------------------------------------------
>
>                 Key: SOLR-6249
>                 URL: https://issues.apache.org/jira/browse/SOLR-6249
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis, SolrCloud
>            Reporter: Gregory Chanan
>
> See SOLR-6137 for more details.
> The basic issue is that Schema API changes return success when the first core is updated, but other cores asynchronously read the updated schema from ZooKeeper.
> So a client application could make a Schema API change and then index some documents based on the new schema that may fail on other nodes.
> Possible fixes:
> 1) Make the Schema API calls synchronous
> 2) Give the client some ability to track the state of the schema.  They can already do this to a certain extent by checking the Schema API on all the replicas and verifying that the field has been added, though this is pretty cumbersome.  Maybe it makes more sense to do this sort of thing on the collection level, i.e. Schema API changes return the zk version to the client.  We add an API to return the current zk version.  On a replica, if the zk version is >= the version the client has, the client knows that replica has at least seen the schema change.  We could also provide an API to do the distribution and checking across the different replicas of the collection so that clients don't need ot do that themselves.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org