You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Mark Miller (JIRA)" <ji...@apache.org> on 2019/02/01 03:05:00 UTC

[jira] [Commented] (SOLR-13189) Need reliable example (Test) of how to use TestInjection.failReplicaRequests

    [ https://issues.apache.org/jira/browse/SOLR-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16757918#comment-16757918 ] 

Mark Miller commented on SOLR-13189:
------------------------------------

And it was just starting to feel good being away again ...

As an aside, that wait for recoveries call should be nixed because it's flakey after a collection create call. We need to use wait calls that specify the shards and replicas to wait for like the SolrCloudTest tests do now.

What I would guess is happening here is that you are hitting the eventual consistency nature of the system.

In older versions these tests might have worked because before the request returns to the client, the leader would have called to the replica and told it to go into recovery. I believe we no longer make these calls (for good reason, http calls tied to updates was no good). So a replica will only enter recovery when it realizes it should via ZooKeeper communication.

The system will be eventually consistent, but there is no promise it will be consistent even when all replicas are active. You must be willing to wait a short time for consistency and this test does not.

> Need reliable example (Test) of how to use TestInjection.failReplicaRequests
> ----------------------------------------------------------------------------
>
>                 Key: SOLR-13189
>                 URL: https://issues.apache.org/jira/browse/SOLR-13189
>             Project: Solr
>          Issue Type: Sub-task
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>            Priority: Major
>         Attachments: SOLR-13189.patch
>
>
> We need a test that reliably demonstrates the usage of {{TestInjection.failReplicaRequests}} and shows what steps a test needs to take after issuing updates to reliably "pass" (finding all index updates that succeeded from the clients perspective) even in the event of an (injected) replica failure.
> As things stand now, it does not seem that any test using {{TestInjection.failReplicaRequests}} passes reliably -- *and it's not clear if this is due to poorly designed tests, or an indication of a bug in distributed updates / LIR*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org