You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "Jan Høydahl (Jira)" <ji...@apache.org> on 2023/03/28 20:41:00 UTC

[jira] [Commented] (SOLR-16722) API to flag a solr node NOT READY for requests

    [ https://issues.apache.org/jira/browse/SOLR-16722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17706145#comment-17706145 ] 

Jan Høydahl commented on SOLR-16722:
------------------------------------

One simple option is to create a new top-level znode {{/disabled_nodes}}:
{code:java}
/live_nodes
  +-- foo:8983_node
  +-- bar:8983_node
/disabled_nodes
  +-- bar:8983_node{code}
The znode will normally be empty (or non-existing), but if it exists with >0 children, then those nodes are flagged as disabled for traffic. It could be because the solr-operator is planning to shut down the node, or it could be a way to temporarily repel traffic from a node during troubleshooting. SolrJ would be updated to consider {{disabled_nodes}} in addition to replica-state and live_nodes. Also CLUSTERSTATUS response should include this information.

The node would still be "live" and will receive traffic, i.e. the znode is only a signal to SolrJ or other load balancers.

I think disabled_nodes children should be ephemeral so that entries are removed when a node is shut down, thus it cannot be used to repel traffic from a node across node restarts.

There could also be a new cluster API to set and clear the znode. We already have an API (CLUSTERSTATUS) to query it.

> API to flag a solr node NOT READY for requests
> ----------------------------------------------
>
>                 Key: SOLR-16722
>                 URL: https://issues.apache.org/jira/browse/SOLR-16722
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Jan Høydahl
>            Priority: Major
>
> Spinoff from solr operator PR [https://github.com/apache/solr-operator/issues/529]
> When solr-operator performs a rolling restart or rolling upgrade, it will stop one node at a time, but SolrJ (both external and internal) will continue sending traffic to the node until requests start failing, since at the time SolrJ picks up the "live_nodes" change, it is too late.
> While the operator PR mentioned above will prevent external requests through the k8s service to the draining node, it will not prevent internal traffic.
> This issue thus aims to introduce some API or mechanism to flag a Solr node as NOT READY for traffic.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org