You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by GitBox <gi...@apache.org> on 2021/09/14 19:13:55 UTC

[GitHub] [solr-operator] vladiceanu opened a new issue #322: Solr pods graceful shutdown

vladiceanu opened a new issue #322:
URL: https://github.com/apache/solr-operator/issues/322


   ### Context
   In Kubernetes, from the moment you issue `kubectl delete pod` command and until the Pod is deleted there are a few steps that happen: 
   1. endpoint is being removed from the Endpoints (k8s object);
   2. if the pod has a `preStop` hook set, it will run it before `SIGTERM` is invoked;
   3. control-plane fires an event to Kube-proxy, CoreDNS, Ingress controller to deregister that IP address, and no further traffic should be sent to it
   **AND, in parallel,**
   app receives `SIGTERM` and, if it's able to process it, it starts graceful shutdown; otherwise - wait for the `terminationGracePeriodSeconds` (default 30s) to pass, and then `SIGKILL` is fired;
   4. pod is deleted;
   
   ### Graceful shutdown
   In order to achieve a graceful shutdown, we must satisfy the following condition(s): no traffic is sent to the non-existent IP (pod already deleted). That should be done in Step 2 described above, but, since those components (kube-proxy, coredns, ingress controller) might be busy with something else, **there is no guarantee that the IP will be removed from their state before the Pod is gone**. How long would it take? It depends; some of them might take less than a second, the others a bit longer.  
   
   **Race condition**
   As mentioned, deregistration of the IP from kube-proxy, CoreDNS, ingress controller, and `SIGTERM` sent to the APP happens in parallel, which can cause a few race conditions, one of them is: what if pod is deleted before the IP is deregistered? That could be a problem, since traffic might be sent to a non-existent IP.
   
   
   ### Issue statement
   Graceful shutdown of SolrCloud. Currently, we use `preStop` hook where we run `solr stop -p 8983` (which kernel behind the scenes sends `SIGQUIT` to the process) which stop solr instances on port 8983 that run in the background. 
   
   But, as we already know, `preStop` hook (step #2) is executed before kube-proxy, coredns, ingress controller received the event to deregister the IP address from their local state (step 3) and it will stop the Solr instance before deregistering its IP, , thus, traffic will be sent to a non-existent IP.
   
   A few ways to handle that:
   - convert `SIGTERM` into `SIGQUIT` and forward it further to Solr processes with process supervisor tools like https://github.com/Yelp/dumb-init or https://github.com/krallin/tini;
   - a bit harsh: do nothing and wait for `SIGKILL`; this way we'll have better chances that within `terminationGracePeriodSeconds` (default 30s) the kube-proxy, coredns, ingress controller will deregister the IP and no traffic is sent to the Pod, and when `SIGKILL` fires - pod gets forcefully deleted.
   
   What could be the other available options? 
   
   cc @giannis
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr-operator] HoustonPutman closed issue #322: Solr pods graceful shutdown

Posted by GitBox <gi...@apache.org>.
HoustonPutman closed issue #322:
URL: https://github.com/apache/solr-operator/issues/322


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr-operator] vladiceanu commented on issue #322: Solr pods graceful shutdown

Posted by GitBox <gi...@apache.org>.
vladiceanu commented on issue #322:
URL: https://github.com/apache/solr-operator/issues/322#issuecomment-919443068


   hi @HoustonPutman @thelabdude, could you please have a look? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org