You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@solr.apache.org by GitBox <gi...@apache.org> on 2022/09/13 21:01:15 UTC

[GitHub] [solr-operator] HoustonPutman commented on issue #471: How to prevent node rotation behavior from causing cluster instability

HoustonPutman commented on issue #471:
URL: https://github.com/apache/solr-operator/issues/471#issuecomment-1245949988

This is a very good callout, so thank you for bringing it up.

We can easily add a PodDisruptionBudget for the entire SolrCloud cluster, and the `maxUnavailable` can be populated with the `SolrCloud.spec.updateStrategy.managed.maxPodsUnavailable` value. This is a pretty good first-step and gets us halfway there.

The next half would be replicating the `SolrCloud.spec.updateStrategy.managed.maxShardReplicasUnavailable` functionality through PDBs. Through the managed update code, we already understand the nodes that each shard resides on, so it wouldn't be far-fetched to create a PDB for every shard, using a custom labelSelector to pick out the node-name labels of nodes that we already know host that shard. We could even just routinely check (every minute or so) to update/create/delete PDBs, as we aren't listening to the cluster state in the cloud. The [PodDisruptionBudget documentation](https://kubernetes.io/docs/tasks/run-application/configure-pdb/#arbitrary-controllers-and-selectors) tells us that we can't use `maxUnavailable`, as PDBs with custom labelSelectors can only use int-valued `minAvailable`. That's fine because we can always convert between the two, since we know the number of Nodes that host the shard.

However, there's [another rule](https://kubernetes.io/docs/tasks/run-application/configure-pdb/#arbitrary-controllers-and-selectors) for PDBs that makes this part of the solution untenable. It specifies that you can only have 1 `PodDisruptionBudget` per-pod, and for this solution we would need to have a PDB for every shard that lives on that pod, which will almost certainly be >1. (Otherwise the general cluster PDB should be fine to use)

Hopefully Kubernetes will eventually remove the PDB per-pod limit, then we can fully (and not-too-difficultly) implement shard-level PDBs managed by the Solr Operator. In the meantime, we should go-ahead and implement the per-cluster `PodDisruptionBudget` and fill it with the value used in the managed update settings.

Given the limitations, what are your thoughts on moving forward with the cluster-level PDB @joshsouza ?

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org