You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "Jan Høydahl (Jira)" <ji...@apache.org> on 2023/03/28 20:41:00 UTC
[jira] [Commented] (SOLR-16722) API to flag a solr node NOT READY for requests
[ https://issues.apache.org/jira/browse/SOLR-16722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17706145#comment-17706145 ]
Jan Høydahl commented on SOLR-16722:
------------------------------------
One simple option is to create a new top-level znode {{/disabled_nodes}}:
{code:java}
/live_nodes
+-- foo:8983_node
+-- bar:8983_node
/disabled_nodes
+-- bar:8983_node{code}
The znode will normally be empty (or non-existing), but if it exists with >0 children, then those nodes are flagged as disabled for traffic. It could be because the solr-operator is planning to shut down the node, or it could be a way to temporarily repel traffic from a node during troubleshooting. SolrJ would be updated to consider {{disabled_nodes}} in addition to replica-state and live_nodes. Also CLUSTERSTATUS response should include this information.
The node would still be "live" and will receive traffic, i.e. the znode is only a signal to SolrJ or other load balancers.
I think disabled_nodes children should be ephemeral so that entries are removed when a node is shut down, thus it cannot be used to repel traffic from a node across node restarts.
There could also be a new cluster API to set and clear the znode. We already have an API (CLUSTERSTATUS) to query it.
> API to flag a solr node NOT READY for requests
> ----------------------------------------------
>
> Key: SOLR-16722
> URL: https://issues.apache.org/jira/browse/SOLR-16722
> Project: Solr
> Issue Type: New Feature
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Jan Høydahl
> Priority: Major
>
> Spinoff from solr operator PR [https://github.com/apache/solr-operator/issues/529]
> When solr-operator performs a rolling restart or rolling upgrade, it will stop one node at a time, but SolrJ (both external and internal) will continue sending traffic to the node until requests start failing, since at the time SolrJ picks up the "live_nodes" change, it is too late.
> While the operator PR mentioned above will prevent external requests through the k8s service to the draining node, it will not prevent internal traffic.
> This issue thus aims to introduce some API or mechanism to flag a Solr node as NOT READY for traffic.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org