You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@solr.apache.org by "fliphess (via GitHub)" <gi...@apache.org> on 2023/05/30 15:04:45 UTC

[GitHub] [solr-operator] fliphess opened a new issue, #574: Solr operator not updating all container images on helm update

fliphess opened a new issue, #574:
URL: https://github.com/apache/solr-operator/issues/574

   Hi :) 
   
   We are using a gitlab pipeline running helm to deploy our solr cluster.  As we want to have some utils like the AWS cli on-board for restoring from a backup, we build a new docker container with every pipeline. As we use the git shasum of the repository, the image changes with every pipeline. 
   
   We notice something weird: When updating some of the nodes are updated, but not all of them: If we have 3 solr pods, 2 of them are using the latest image, but one does not (the setup-zk init container and the solrcloud-node use the same image) 
   
   Looking at both the statefulset and the solrcloud objects: Both show that all nodes are ready and uptodate, but one of the pods is not updated at all.... 
   
   We use the latest (`8.11.2`) v8 solr version from dockerhub for solr and version `0.7.0` for the operator, where we add some extra helper tools to use when things go haywire.
   
   I'm not sure what information to provide, I can provide a lot :) 
   
   
   ```
   kubectl get pods  -n cluster cluster-solrcloud-0 cluster-solrcloud-1 cluster-solrcloud-2 -o yaml  | grep image: | grep solr-cluster | cut -d: -f3 | sort | uniq -c
         2 4119877046180762ee630bd4165c839c488371b7
        10 5f639c6751ab8faa1bd485a3e1b0f7362b3437b2
   ```
   
   I've checked the logs of the operator and I don't see any issues: The operator does a loop updating all nodes but skips the last one (3 replicas, node 2 is not updated.) 
   
   Our update strategy is as follows:
   
   ```
     updateStrategy:
       managed:
         maxPodsUnavailable: 1
         maxShardReplicasUnavailable: 1
       method: Managed
   ```
   I have attached my solrcloud yaml to this issue :)
   
   [Uploading solrcloud.txt…]()
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org

[GitHub] [solr-operator] fliphess commented on issue #574: Solr operator not updating all container images on helm update

Posted by "fliphess (via GitHub)" <gi...@apache.org>.

fliphess commented on issue #574:
URL: https://github.com/apache/solr-operator/issues/574#issuecomment-1568842531

   Hey @HoustonPutman! Thanks for the reply. 
   
   The weird thing is that the statefulset itself shows the correct image tag and so does the solrcloud yaml object.  The solrcloud status is indicating it's up to date and the solr-operator is not generating new logging.
   In the meantime all the pods in the solr cluster are up, but not all of them are properly updated. 
   
   Thinking out loud, I didn't check `kubectl events`, I'll check that tomorrow morning right away, perhaps there is some node anti affinity in the way of scheduling the new pod while the old one is terminating or something... 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org

[GitHub] [solr-operator] HoustonPutman commented on issue #574: Solr operator not updating all container images on helm update

Posted by "HoustonPutman (via GitHub)" <gi...@apache.org>.

HoustonPutman commented on issue #574:
URL: https://github.com/apache/solr-operator/issues/574#issuecomment-1568740624

   Ahh yeah that log line might be a bit unclear. You cannot "cancel" an update to a pod, it's postponed till later, but eventually the pod will be updated (if the conditions to update are met).
   
   That log line is telling you that one of the pods is not healthy, so at some point the pod is not "ready". If there are pods not updated still. Look for the most recent log lines in the operator, to tell you why it isn't continuing and deleting the last few pods. If that log line is still being printed, then for some reason the solr operator does not believe that all Solr pods are "ready".
   
   Maybe the cluster is having issues scheduling the pods after they are being deleted? Can you do a `kubectl get pods` to show the solr pods?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org

[GitHub] [solr-operator] fliphess closed issue #574: Solr operator not updating all container images on helm update

Posted by "fliphess (via GitHub)" <gi...@apache.org>.

fliphess closed issue #574: Solr operator not updating all container images on helm update
URL: https://github.com/apache/solr-operator/issues/574


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org

[GitHub] [solr-operator] HoustonPutman commented on issue #574: Solr operator not updating all container images on helm update

Posted by "HoustonPutman (via GitHub)" <gi...@apache.org>.

HoustonPutman commented on issue #574:
URL: https://github.com/apache/solr-operator/issues/574#issuecomment-1570355154

   > In the meantime all the pods in the solr cluster are up, but not all of them are properly updated.
   
   > the solr-operator is not generating new logging.
   
   These two things both being true is very very strange. If you could provide the output of `kubectl describe solrcloud <name>` that could be useful, to look at the solrcloud's status.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org

[GitHub] [solr-operator] fliphess commented on issue #574: Solr operator not updating all container images on helm update

Posted by "fliphess (via GitHub)" <gi...@apache.org>.

fliphess commented on issue #574:
URL: https://github.com/apache/solr-operator/issues/574#issuecomment-1570495591

   I found something: When I kill the solr-operator pod, everything starts running  again and soon after all pods are at the same container version.  The solr-operator then starts logging again and triggers new backups etc...
   So apparently the solr-operator becomes unresponsive. 
   
   Before digging any further, let me first check what happens if I give the solr-operator a lot more CPU and memory, perhaps it's running out of something... 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org

[GitHub] [solr-operator] fliphess commented on issue #574: Solr operator not updating all container images on helm update

Posted by "fliphess (via GitHub)" <gi...@apache.org>.

fliphess commented on issue #574:
URL: https://github.com/apache/solr-operator/issues/574#issuecomment-1578554512

   I'm closing this for now: After changing the resources for our operator, it hasn't appeared before, so I think this is a cornercase in our own cluster rather than a problem in the operator itself. 
   
   Thanks for your help and suggestions Houston! :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org

[GitHub] [solr-operator] fliphess commented on issue #574: Solr operator not updating all container images on helm update

Posted by "fliphess (via GitHub)" <gi...@apache.org>.

fliphess commented on issue #574:
URL: https://github.com/apache/solr-operator/issues/574#issuecomment-1568641585

   Adding to this issue: In the solr-operator logging I see a lot of these: 
   
   ```solr-operator-5c7899cdff-ng2tl solr-operator 2023-05-30T15:13:56Z	INFO	ManagedUpdateSelector	Pod update selection canceled. The number of updated pods unavailable equals or exceeds the calculated maxPodsUnavailable.	{"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr-cluster","namespace":"solr-cluster"}, "namespace": "solr-cluster", "name": "solr-cluster", "reconcileID": "c88d7a5a-d2dd-498e-a9a3-f0c789e86ab1", "unavailableUpdatedPods": 1, "outOfDatePodsNotStarted": 0, "alreadyScheduledForDeletion": 0, "maxPodsUnavailable": 1}```
   
   Does this mean the update for a specific pod is canceled? Or is it postponed to be updated at a later time? 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org