You are viewing a plain text version of this content. The canonical link for it is here.

Posted to notifications@couchdb.apache.org by GitBox <gi...@apache.org> on 2020/09/16 10:56:25 UTC

[GitHub] [couchdb-helm] tudordumitriu opened a new issue #40: Cluster auto-scaling best practices

tudordumitriu opened a new issue #40:
URL: https://github.com/apache/couchdb-helm/issues/40


   Hi guys
   I think this chart is very useful in a k8s cluster but (and maybe it's just me missing out stuff) but I think the community/this chart is missing some support / best practices for auto-scaling (increasing the number of couchdb nodes inside a k8s cluster)
   
   It's quite clear that all of us want to deploy within a k8s cluster, in a cloud, because we can scale out (and in) based on various metrics.
   We have been working on a azure aks setup with SSDs as data storage support for couchdb and our business services.
   Now, what the stress testing revealed was the fact that the couchdb, in our case at least, is using intensively the CPU and we do want to get prepared for such bursts but in a automated way.
   The obvious solution is to use Cluster Autoscaler + Horizontal Pod Autoscaler, so that we can add (and remove) a new node and a new pod (pods), on demand.
   
   But ,the problem is (and this is here where I might be wrong) is that the couchdb cluster needs to be updated manually. 
   More than that, if we do have a big amount of data, how do we properly set up the new node to be "warm" when is added in the cluster (meaning replicating physically the data drive if that's even an option, so that the cluster itself won't sync internally which from our experiments seem to use quite some resources).
   I did go through the couchdb docs, couch helm chart files, various documentation sources and I wasn't able to find any automated way of doing this. 
   We are setting up the cluster via calls to the http /_cluster_setup endpoint which is fine if we do it manually, but if the autoscaling happens automatically, the new node would be basically of no use until is added to cluster, manually.
   
   So, if possible, pls share with us any best practices or even mechanisms that could help automate this job.
   Thanks
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [couchdb-helm] willholley edited a comment on issue #40: Cluster auto-scaling best practices

Posted by GitBox <gi...@apache.org>.

willholley edited a comment on issue #40:
URL: https://github.com/apache/couchdb-helm/issues/40#issuecomment-693410905

@tudordumitriu I think this is outside the scope of the Helm chart, though I can see how it's related. I would be extremely wary of using an autoscaler to automate database cluster scaling - the chance of data loss is high and the timescales and nuances involved in moving shards around likely require manual supervision anyway.

In the general case, this is a difficult problem, which is probably why there are no community tools to address it (regardless of Kubernetes). One tool you could look at is [couchdb-admin](https://github.com/cabify/couchdb-admin) from Cabify. I haven't used it personally but it looks to automate at least some of the management tasks.

Unfortunately, the process for moving shards described in [the docs](https://docs.couchdb.org/en/stable/cluster/sharding.html#moving-a-shard) is tricky in Kubernetes because not many storage backends support `ReadWriteMany` [AccessModes](https://docs.couchdb.org/en/stable/cluster/sharding.html#moving-a-shard). You could try exposing port 22 between the CouchDB pods and using SCP. You're likely better off using CouchDB internal replication to create additional shard replicas instead of moving files/indexes directly, but it's a slower process.

Adding a node would require something like:

* Join the new node to the cluster
* Put the new node in [maintenance mode](https://docs.couchdb.org/en/stable/config/couchdb.html#couchdb/maintenance_mode).
* For each database in the cluster:
* figure out the new shard map to ensure even spread of shards and replicas across machines/availability zones. For your new node, add the desired shard ranges/replicas to each shard map. This temporarily increases the number of replicas of each shard. For large numbers of databases/shards, you might need to serialize this process.
* This should result in CouchDB internal replication creating and populating the database shards on the new node. When design documents are replicated, indexing jobs will trigger on the new node to create the required indexes. For large shards, this may take some time (hours/days).
* Monitor the internal replication backlog and indexing process to observe when the replication and indexing process is complete. I'm not sure if metrics for these are exposed by `_stats` or whether you'd need to parse the log output (e.g. look for `[warning] <date> <existing node> <pid> -------- mem3_sync shards/<shard range>/<db name>.<epoch> <new node> {pending_changes,<value>}`) to determine the internal replication backlog. Indexing backlog can be queried using the `_active_tasks` endpoint.
* If/when internal replication and indexing is up to date, take the node out of [maintenance mode](https://docs.couchdb.org/en/stable/config/couchdb.html#couchdb/maintenance_mode).
* Update the shard map to remove the original replicas that are now redundant

Removing a node from the cluster would be a similar process, updating the shard map to ensure enough shard replicas exist on the remaining nodes before removing it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [couchdb-helm] willholley commented on issue #40: Cluster auto-scaling best practices

Posted by GitBox <gi...@apache.org>.

willholley commented on issue #40:
URL: https://github.com/apache/couchdb-helm/issues/40#issuecomment-693410905

Unfortunately, the process for moving shards described in [the docs](https://docs.couchdb.org/en/stable/cluster/sharding.html#moving-a-shard) is tricky in Kubernetes because not many storage backends support `ReadWriteMany` [AccessModes](https://docs.couchdb.org/en/stable/cluster/sharding.html#moving-a-shard). You're likely better off using CouchDB internal replication to create additional shard replicas instead of moving files/indexes directly, but it's a slower process.

Adding a node would require something like:

Removing a node from the cluster would be a similar process, updating the shard map to ensure enough shard replicas exist on the remaining nodes before removing it.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [couchdb-helm] gpothier edited a comment on issue #40: Cluster auto-scaling best practices

Posted by GitBox <gi...@apache.org>.

gpothier edited a comment on issue #40:
URL: https://github.com/apache/couchdb-helm/issues/40#issuecomment-933572542

Hi, mostly thinking out loud here, but would the following be a valid scaling strategy?
1. Initially deploy the CouchDB cluster with a large number of CouchDB nodes (ie. k8s pods), configuring their resource allocations quite low so that many pods can run on a single k8s node.
2. When usage increases (as measured by global CPU utilization), increase the resource allocations of the pods so that they get spread out to more (and possibly more powerful) k8s nodes.
3. Conversely, when usage decreases, reduce the resource allocations so that pods can regroup to fewer k8s nodes

This way, there is no need for resharding. However (and please note I am a k8s beginner), I don't think this "migration" of pods to other nodes when their resource allocations change would be automatic, so it would probably require killing the pods to force them being recreated elsewhere.

EDIT: just realized that changing the resource requests of pods according to actual usage and migrating them to other k8s nodes is the Vertical Pod Autoscaler's job, so it seems scaling could be achieved by implementing point 1 above and properly configuring a Vertical Pod Autoscaler (and a Cluster Autoscaler).

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [couchdb-helm] willholley commented on issue #40: Cluster auto-scaling best practices

Posted by GitBox <gi...@apache.org>.

willholley commented on issue #40:
URL: https://github.com/apache/couchdb-helm/issues/40#issuecomment-694116774


   you need to add the node to the cluster (step 6) before putting it in MM and moving shards around.
   
   Regarding the questions:
   
   > 1. Should we be having an odd number of nodes within a cluster (I've noticed from time to time strange write errors due to quorum not being met) - that means that we have to to the above for 2 extra nodes?
   
   Multiples of 3 is usually simplest because shard distribution is then equal, but it's not required.
   
   > 2. Is the k8s service type LoadBalancer enough to handle the load management for the couchdb cluster deployed as a statefullset?
   
   Yes - no problems with the k8s service loadbalancer (it's just IPTables/IPVS). If you expose CouchDB to the outside using an Ingress the performance will depend on which Ingress implementation you use etc.
   
   > 3. I've noticed serious performance differences between running a single node cluster and a 3 nodes cluster (within the same node though), meaning that the single node preforms way better (and I guess that's due to the fact the nodes need to keep in sync and lots of replication calls are being made), so basically there should be one k8s node for each couchdb node (considering that the couchdb containers are quite some CPU consumers in load tests)
   
   Yes - there's not really any benefit in having more than one CouchDB node per worker. The only exception I can think of is that you could "oversize" the cluster initially and then spread out CouchDB nodes amongst machines as you grow without needing to go through the cluster expansion steps described above, assuming you use remote storage.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [couchdb-helm] tudordumitriu commented on issue #40: Cluster auto-scaling best practices

Posted by GitBox <gi...@apache.org>.

tudordumitriu commented on issue #40:
URL: https://github.com/apache/couchdb-helm/issues/40#issuecomment-693520522

Thank you @willholley! Truly appreciate it

Sorry for not being 100% within scope, but since the final goal is to deploy it within a cluster it made some sense to address it here (and honestly didn't know somewhere else to go).
So bottom line, because of the complexity of the process this job cannot be automated, and we should try to estimate the loads and and to anticipate the timings as best as we can.

When time comes (loose terms warning):
1. We add a new node in our k8s cluster
2. Update the statefulset number of replicas (the new node WON'T be added to the cluster)
2. We switch the new couchdb node to maintenance mode (with appropriate settings - not 100% sure how the process can be serialized, would appreciate a hint)
3. Wait for the sync jobs to finish (and might take a while), because as you said copying data it doesn't make sense and might be error prone
4. Take the node out of maintenance mode
5. Add the node to the cluster

I still have some questions (some maybe out of scope as well):
1. Should we be having an odd number of nodes within a cluster (I've noticed from time to time strange write errors due to quorum not being met) - that means that we have to to the above for 2 extra nodes?
2. Is the k8s service type LoadBalancer enough to handle the load management for the couchdb cluster deployed as a statefullset?
3. I've noticed serious performance differences between running a single node cluster and a 3 nodes cluster (within the same node though), meaning that the single node preforms way better (and I guess that's due to the fact the nodes need to keep in sync and lots of replication calls are being made), so basically there should be one k8s node for each couchdb node (considering that the couchdb containers are quite some CPU consumers in load tests)

Thanks again!

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [couchdb-helm] gpothier commented on issue #40: Cluster auto-scaling best practices

Posted by GitBox <gi...@apache.org>.

gpothier commented on issue #40:
URL: https://github.com/apache/couchdb-helm/issues/40#issuecomment-933572542


   Hi, mostly thinking out loud here, but would the following be a valid scaling strategy?
   1. Initially deploy the CouchDB cluster with a large number of CouchDB nodes (ie. k8s pods), configuring their resource allocations quite low so that many pods can run on a single k8s node.
   2. When usage increases (as measured by global CPU utilization), increase the resource allocations of the pods so that they get spread out to more (and possibly more powerful) k8s nodes. 
   3. Conversely, when usage decreases, reduce the resource allocations so that pods can regroup to fewer k8s nodes
   
   This way, there is no need for resharding. However (and please note I am a k8s beginner), I don't think this "migration" of pods to other nodes when their resource allocations change would be automatic, so it would probably require killing the pods to force them being recreated elsewhere.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [couchdb-helm] tudordumitriu closed issue #40: Cluster auto-scaling best practices

Posted by GitBox <gi...@apache.org>.

tudordumitriu closed issue #40:
URL: https://github.com/apache/couchdb-helm/issues/40


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [couchdb-helm] tudordumitriu commented on issue #40: Cluster auto-scaling best practices

Posted by GitBox <gi...@apache.org>.

tudordumitriu commented on issue #40:
URL: https://github.com/apache/couchdb-helm/issues/40#issuecomment-694691025


   Thanks again!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org