You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by David Alan Hjelle <da...@thehjellejar.com> on 2018/04/30 14:25:06 UTC

Process for restoring failed node in a cluster?

(Feel free to point me to resources to answer this question, but I haven’t seen a definitive answer yet.)

What’s the recommended process for restoring a failed node in a cluster?

It appears that the process would be:

Rebuild a node with the same configuration and node name as the failed node, and bring it up on the network.
Since the node isn’t automatically recognized, go to an *existing* node and first GET the revision of the `:5986/_nodes/couchdb@X.X.X.X <ma...@X.X.X.X>` document, and then PUT a new version of that document, such as `curl -X PUT "http://admin:password@127.0.0.1:5986/_nodes/couchdb@X.X.X.X?rev=1-967a00dff5e02add41819138abb3284d <http://admin:password@127.0.0.1:5986/_nodes/couchdb@X.X.X.X?rev=1-967a00dff5e02add41819138abb3284d>" -d {}`.
While the documents are synced, the views are per-node, so they need to be manually refreshed per-database.

Does that cover it? If one is running a 3-node cluster with n=3, could I avoid step 3 by copying over the `shards` and `.shards` directories from another node, since all nodes have identical copies of the data?

Thanks for your help!

David Alan Hjelle