You are viewing a plain text version of this content. The canonical link for it is here.

Posted to notifications@couchdb.apache.org by GitBox <gi...@apache.org> on 2020/09/16 16:31:13 UTC

[GitHub] [couchdb-helm] tudordumitriu commented on issue #40: Cluster auto-scaling best practices

tudordumitriu commented on issue #40:
URL: https://github.com/apache/couchdb-helm/issues/40#issuecomment-693520522

Thank you @willholley! Truly appreciate it

Sorry for not being 100% within scope, but since the final goal is to deploy it within a cluster it made some sense to address it here (and honestly didn't know somewhere else to go).
So bottom line, because of the complexity of the process this job cannot be automated, and we should try to estimate the loads and and to anticipate the timings as best as we can.

When time comes (loose terms warning):
1. We add a new node in our k8s cluster
2. Update the statefulset number of replicas (the new node WON'T be added to the cluster)
2. We switch the new couchdb node to maintenance mode (with appropriate settings - not 100% sure how the process can be serialized, would appreciate a hint)
3. Wait for the sync jobs to finish (and might take a while), because as you said copying data it doesn't make sense and might be error prone
4. Take the node out of maintenance mode
5. Add the node to the cluster

I still have some questions (some maybe out of scope as well):
1. Should we be having an odd number of nodes within a cluster (I've noticed from time to time strange write errors due to quorum not being met) - that means that we have to to the above for 2 extra nodes?
2. Is the k8s service type LoadBalancer enough to handle the load management for the couchdb cluster deployed as a statefullset?
3. I've noticed serious performance differences between running a single node cluster and a 3 nodes cluster (within the same node though), meaning that the single node preforms way better (and I guess that's due to the fact the nodes need to keep in sync and lots of replication calls are being made), so basically there should be one k8s node for each couchdb node (considering that the couchdb containers are quite some CPU consumers in load tests)

Thanks again!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org