You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Mano Kovacs (JIRA)" <ji...@apache.org> on 2018/10/15 13:25:00 UTC

[jira] [Commented] (SOLR-12708) Async collection actions should not hide failures

    [ https://issues.apache.org/jira/browse/SOLR-12708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16650196#comment-16650196 ] 

Mano Kovacs commented on SOLR-12708:
------------------------------------

[~varunthacker], thank you for the explanation! I am a still a bit behind, I think I don't understand one part.

bq. CreateShardCmd is one core admin API call . So the response is either a success or failure. Hence the if-else block covers it.
I don't understand this part. I used this command as base, as similarly to the restore command; create shard operation involves adding multiple replicas. I think this addReplica command in {{CreateShardCmd}} is called multiple due to [this for-cycle|http://github.mtv.cloudera.com/CDH/lucene-solr/blob/cdh6.x/solr/core/src/java/org/apache/solr/cloud/api/collections/CreateShardCmd.java#L91]. I assume this is the reason why multiple failures could be catched.

bq. This way we process requests and responses are very complicated for some reason and we should improve it in general . But do you see what I am seeing here?
Not sure. I think it is more beneficial to see every failure, instead of the first/last one. Especially since they are executed parallel and might have side-effects that require cleanup.

> Async collection actions should not hide failures
> -------------------------------------------------
>
>                 Key: SOLR-12708
>                 URL: https://issues.apache.org/jira/browse/SOLR-12708
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Admin UI, Backup/Restore
>    Affects Versions: 7.4
>            Reporter: Mano Kovacs
>            Assignee: Varun Thacker
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Async collection API may hide failures compared to sync version. [OverseerCollectionMessageHandler::processResponses|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/cloud/api/collections/OverseerCollectionMessageHandler.java#L744] structures errors differently in the response, that hides failures from most evaluators. RestoreCmd did not receive, nor handle async addReplica issues.
> Sample create collection sync and async result with invalid solrconfig.xml:
> {noformat}
> {
> "responseHeader":{
> "status":0,
> "QTime":32104},
> "failure":{
> "localhost:8983_solr":"org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error from server at http://localhost:8983/solr: Error CREATEing SolrCore 'name4_shard1_replica_n1': Unable to create core [name4_shard1_replica_n1] Caused by: The content of elements must consist of well-formed character data or markup.",
> "localhost:8983_solr":"org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error from server at http://localhost:8983/solr: Error CREATEing SolrCore 'name4_shard2_replica_n2': Unable to create core [name4_shard2_replica_n2] Caused by: The content of elements must consist of well-formed character data or markup.",
> "localhost:8983_solr":"org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error from server at http://localhost:8983/solr: Error CREATEing SolrCore 'name4_shard1_replica_n2': Unable to create core [name4_shard1_replica_n2] Caused by: The content of elements must consist of well-formed character data or markup.",
> "localhost:8983_solr":"org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error from server at http://localhost:8983/solr: Error CREATEing SolrCore 'name4_shard2_replica_n1': Unable to create core [name4_shard2_replica_n1] Caused by: The content of elements must consist of well-formed character data or markup."}
> }
> {noformat}
> vs async:
> {noformat}
> {
> "responseHeader":{
> "status":0,
> "QTime":3},
> "success":{
> "localhost:8983_solr":{
> "responseHeader":{
> "status":0,
> "QTime":12}},
> "localhost:8983_solr":{
> "responseHeader":{
> "status":0,
> "QTime":3}},
> "localhost:8983_solr":{
> "responseHeader":{
> "status":0,
> "QTime":11}},
> "localhost:8983_solr":{
> "responseHeader":{
> "status":0,
> "QTime":12}}},
> "myTaskId2709146382836":{
> "responseHeader":{
> "status":0,
> "QTime":1},
> "STATUS":"failed",
> "Response":"Error CREATEing SolrCore 'name_shard2_replica_n2': Unable to create core [name_shard2_replica_n2] Caused by: The content of elements must consist of well-formed character data or markup."},
> "status":{
> "state":"completed",
> "msg":"found [myTaskId] in completed tasks"}}
> {noformat}
> Proposing adding failure node to the results, keeping backward compatible but correct result.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org