You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Erick Erickson (Jira)" <ji...@apache.org> on 2019/10/30 13:00:00 UTC

[jira] [Comment Edited] (SOLR-13882) Collections API COLSTATUS does not check live_nodes when reporting replica's status

    [ https://issues.apache.org/jira/browse/SOLR-13882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962997#comment-16962997 ] 

Erick Erickson edited comment on SOLR-13882 at 10/30/19 12:59 PM:
------------------------------------------------------------------

This fixes the problem, I only tested with a single manual test. Needs a test though, perhaps in {code}CollectionsAPISolrJTest.testColStatus{code}?

I won't pursue this ATM, so anyone who wants to pick it up and add a test, please do.

NOTE: the replicas will be reported as "active" for some time after the node is killed while waiting for Zookeeper to remove the entry from live_nodes.

[~ab]Doe reporting this state as "down" break anything you know of?

For discussion: {code}Replica.isActive() and Replica.getState(){code} are trappy. It's perfectly reasonable to think "If the replica says it's active, it must be". I've spent time debugging this, here's another case where it's an issue, I've seen this mentioned more than a few times on the dev and user's list.

I know it'd be changes to a number of places in the code, and there are some legitimate places where {code}Replica.isActive(){code} _should_ return "active" when the node is _not_ in live_nodes. That said, what do people think about:

- changing at least these two methods to return "down" if the replica's node isn't in live_nodes

- creating some "expert" level method to return what {{getState()}} does now (i.e return "active" if the state is "active" but the node isn't in live_nodes) for those cases that need to make a distinction


was (Author: erickerickson):
This fixes the problem, I only tested with a single manual test. Needs a test though, perhaps in {code}CollectionsAPISolrJTest.testColStatus{code}?

I won't pursue this ATM, so anyone who wants to pick it up and add a test, please do.

NOTE: the replicas will be reported as "active" for some time after the node is killed while waiting for Zookeeper to remove the entry from live_nodes.

> Collections API COLSTATUS does not check live_nodes when reporting replica's status
> -----------------------------------------------------------------------------------
>
>                 Key: SOLR-13882
>                 URL: https://issues.apache.org/jira/browse/SOLR-13882
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Erick Erickson
>            Priority: Major
>         Attachments: SOLR-13882.patch
>
>
> The COLSTATUS command will report all replicas as "active" even when the node is not in live_nodes.
> To reproduce:
>  * Start two Solr instances
>  * create a collection with replicas on both
>  * issue a "kill -9" to one of the Solr instances.
>  * issue the COLSTATUS command
> Result:
> {code}
> {
>  "responseHeader":{
>    "status":0,
>    "QTime":7},
>    "gettingstarted":{
>      "stateFormat":2,
>      "znodeVersion":15,
>      "properties":{
>         "autoAddReplicas":"false",
>          "maxShardsPerNode":"-1",
>         "nrtReplicas":"2",
>         "pullReplicas":"0",
>         "replicationFactor":"2",
>         "router":\{"name":"compositeId"},
>         "tlogReplicas":"0"},
>         "activeShards":2,
>         "inactiveShards":0,
>         "schemaNonCompliant":["(NONE)"],
>         "shards":{
>            "shard1":{
>            "state":"active",
>            "range":"80000000-ffffffff",
>            "replicas":{
>                "total":2,
> #####          "active":2,
> #####          "down":0,
>                "recovering":0,
>               "recovery_failed":0},
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org