You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by GitBox <gi...@apache.org> on 2022/12/13 13:55:29 UTC

[GitHub] [solr-operator] janhoy opened a new issue, #504: Probes for readiness and liveness should be different

janhoy opened a new issue, #504:
URL: https://github.com/apache/solr-operator/issues/504

   We are using kyverno to enforce certain common k8s policies, and it complains about several things in Solr-operator.
   
   One is this one https://kyverno.io/policies/other/ensure_probes_different/ensure_probes_different/
   
   We have auth enabled, so it uses the alternative shell command, which is the same for both probes. Is this on purpose? Could we use different commands?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr-operator] HoustonPutman commented on issue #504: Probes for readiness and liveness should be different

Posted by GitBox <gi...@apache.org>.
HoustonPutman commented on issue #504:
URL: https://github.com/apache/solr-operator/issues/504#issuecomment-1348875539

   We could use different commands. We can probably start mandating Solr 8 soon, and at that point we can use the solr healthcheck handler as the readiness check, and keep the liveness check as is. (Don't necessarily want to restart Solr because of an issue with ZK...) But if you have other suggestions, I am all ears!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr-operator] HoustonPutman commented on issue #504: Probes for readiness and liveness should be different

Posted by GitBox <gi...@apache.org>.
HoustonPutman commented on issue #504:
URL: https://github.com/apache/solr-operator/issues/504#issuecomment-1349006624

   > So the liveness probe failing will cause k8s to kill the POD, and the readiness probe failing will cause traffic to be temporarily routed to other replicas, is that about right?
   
   > Also, if you have 500 cores on a server, and only one is recovering, it would be a pity if the pod was flagged as not-ready, since Solr is capable of routing traffic to all the other cores. But this is perhaps where the PDB comes in...
   
   Yes, but the readiness probe only affects Services that have the `PublishNotReadyAddresses` option set to `true`. Our common service (one endpoint for all nodes) has this set to true, while the headless service has this set to false. Therefore Solr can still route traffic to the example node you mentioned, as is necessary for things like recovery (All internal requests will go to node-specific endpoints, which are managed by the headless service). However users initial requests will not end up on that node if they are using the common service.
   
   I don't really have a strong feeling either way on this, whether its a good thing or a bad thing. But if we use the healthcheck endpoint and just use it for the zk connection then that is safer. We definitely don't want to route requests to nodes that can't talk to ZK, when there are other nodes available. (Solr will deal with this itself via live_nodes hopefully).
   
   > Also, a rolling restart uses the readiness probe as a sign that it can move on to take down the next one? So for that reason we'd like all cores to be up.
   
   The readiness probe is also used for this, but luckily for us the ManagedUpdate option for Solr really doesn't use it a whole lot. Instead it reads the cluster state manually to see when things are healthy enough to move on to the next node(s). So the vast majority of people shouldn't be affected to much by this aspect. (One note, the readiness probe is used when talking about the number of "down" pods, not replicas, so the operator will wait till all nodes are "ready" to do the last pod restart, the overseer.)
   
   Overall I think that the readiness check should be fine to use the healthcheck handler, just making sure that jetty and ZK are ok. Maybe we can add some filesystem stuff there, but I'm not sure how much benefit that will give... The liveness check should probably stay the same as it is now until we find a better way to check whether Solr should be restarted.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


Re: [I] Probes for readiness and liveness should be different [solr-operator]

Posted by "gerlowskija (via GitHub)" <gi...@apache.org>.
gerlowskija closed issue #504: Probes for readiness and liveness should be different
URL: https://github.com/apache/solr-operator/issues/504


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr-operator] janhoy commented on issue #504: Probes for readiness and liveness should be different

Posted by GitBox <gi...@apache.org>.
janhoy commented on issue #504:
URL: https://github.com/apache/solr-operator/issues/504#issuecomment-1348923782

   So the liveness probe failing will cause k8s to kill the POD, and the readiness probe failing will cause traffic to be temporarily routed to other replicas, is that about right? 
   
   Also, a rolling restart uses the readiness probe as a sign that it can move on to take down the next one? So for that reason we'd like all cores to be up. 
   
   Also, if you have 500 cores on a server, and only one is recovering, it would be a pity if the pod was flagged as not-ready, since Solr is capable of routing traffic to all the other cores. But this is perhaps where the PDB comes in...
   
   I'm still a bit confused over this topic so I don't have a better idea than using health endpoint..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr-operator] janhoy commented on issue #504: Probes for readiness and liveness should be different

Posted by GitBox <gi...@apache.org>.
janhoy commented on issue #504:
URL: https://github.com/apache/solr-operator/issues/504#issuecomment-1349385158

   > Our common service (one endpoint for all nodes) has this set to true, while the headless service has this set to false.
   
   Thanks for the explanation. So there is a reason to direct clients to the common endpoint. Especially if the client is not cluster state aware.
   
   No rush for me, we have disabled the policy checks. I see ZK-op gets flagged with the the same warnings.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org