You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Michael Gibney (Jira)" <ji...@apache.org> on 2021/03/05 20:43:00 UTC

[jira] [Commented] (SOLR-15221) Distributed commit errors are not propagated to the initiating client

    [ https://issues.apache.org/jira/browse/SOLR-15221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17296329#comment-17296329 ] 

Michael Gibney commented on SOLR-15221:
---------------------------------------

I was initially (and still am) inclined to think that this should be addressed in favor of consistently propagating errors to the client. But my first stab at "fixing" it (I tried several different ways of adding the relevant {{error}} to {{errorsForClient}} in [DistributedZkUpdateProcessor.doDistribFinish()|https://github.com/apache/lucene-solr/blob/99a4bbf3a0ab93/solr/core/src/java/org/apache/solr/update/processor/DistributedZkUpdateProcessor.java#L1075-L1100]), while mostly successful, reliably errors on exactly one test ([HttpPartitionOnCommitTest.test|https://github.com/apache/lucene-solr/blob/99a4bbf3a0ab93/solr/core/src/test/org/apache/solr/cloud/HttpPartitionOnCommitTest.java#L181-L197]) in the existing test suite. This makes me think that I'm missing something wrt replica recovery or something, so I'm not sure how to proceed.

I've attached  [^SOLR-15221-initial-tests.patch]  with several tests that demonstrate existing behavior as-is (succeed) and one {{AwaitsFix}} test that fails on asserting consistency in responses (with different replicas throwing errors).

> Distributed commit errors are not propagated to the initiating client 
> ----------------------------------------------------------------------
>
>                 Key: SOLR-15221
>                 URL: https://issues.apache.org/jira/browse/SOLR-15221
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: master (9.0)
>            Reporter: Michael Gibney
>            Priority: Minor
>         Attachments: SOLR-15221-initial-tests.patch
>
>
> Distributed commit errors are not currently propagated back to the client that initially issued the commit command. So, any commit (e.g., issued via {{CloudSolrClient}}, {{curl}} to Http API, etc.) responds with Http status code {{200}}, API status {{0}}, as long as the commit to the "local" core arbitrarily associated with the request succeeds. This happens no matter how many distributed commits succeed or fail (at least, to other leader replicas -- I've only tested w/ replication factor 1 at the moment).
> Inconsistency -- i.e. an error on an arbitrarily-determined "local" replica propagates propagates to the client, but an error on all other replicas does not -- is the focus of this issue; but this issue is raised with no preconceived notions wrt _how_ the inconsistency should be resolved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org