You are viewing a plain text version of this content. The canonical link for it is here.

Posted to notifications@couchdb.apache.org by GitBox <gi...@apache.org> on 2018/02/27 21:45:32 UTC

[GitHub] kocolosk commented on issue #1139: Fix for issue #1136 - Error 500 deleting DB without quorum

kocolosk commented on issue #1139: Fix for issue #1136 - Error 500 deleting DB without quorum
URL: https://github.com/apache/couchdb/pull/1139#issuecomment-369037334

Thanks for this PR @jjrodrig, very well done.

The orphaned files are a known condition. They are OK from the perspective of database correctness, as a new database created with the same name will have a different creation timestamp and so the old data would not be surfaced. It could be a useful future enhancement to remove orphaned shard files in a background process.

The previous behavior of distinguishing between a majority or minority of committed updates to the replicas of the shard table is not something I'm interested in preserving. I think we can do better.

I *do* think the use of 202 Accepted as an indicator to the client that "hey, things are a little messy right now, you might see surprises while we work things out" is a good thing. My concern with the current PR is that we may return 200 OK to a user even though some nodes in the cluster still host the old (soon-to-be-deleted) version of the database. For example, consider the following sequence of events:

1. Cluster experiences a network partition which creates subsets A and B
2. User submits `DELETE /foo` to a node in A and receives `200 OK`
3. User submits `PUT /foo` to a node in A and receives `200 OK`
4. User submits `PUT /foo/bar` to a node in B and receives `202 Accepted`
5. Network partition resolves, shard maps are updated

In this sequence the `/foo/bar` document will be lost permanently ?

Perhaps a preferable approach is to use `202 Accepted` for every situation in which

1. We get at least one acknowledgement and
2. We do not hear back from a cluster member

The downside of this approach is that we wait around to hear back from every cluster member, and the `request_timeout` can be quite large. But I think we need to be quite careful about using `200 OK` in situations where some cluster member could still be accepting new data in a shard on death row.

What do you think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services