You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org> on 2018/07/31 07:15:00 UTC

[jira] [Commented] (SOLR-12607) Investigate ShardSplitTest failures

    [ https://issues.apache.org/jira/browse/SOLR-12607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16563230#comment-16563230 ] 

Shalin Shekhar Mangar commented on SOLR-12607:
----------------------------------------------

The testSplitWithChaosMonkey failures increased noticeably after SOLR-11665 was committed. I looked at the logs of a recent failure and here's what I found:

# Shard Split succeeds in creating new sub-shards and new replicas
# The leader node is killed by chaos monkey before the new replicas can become active
# SOLR-11665 kicks in and cleans up (deletes) the sub-shards in construction including all their state from ZK
# The old leader node is started up again and re-registers the local cores thereby creating state in ZK again. However this time, since the parent shard information was deleted by the cleanup, the state is missing parent and range and slice state is set to active.
# This causes the assertions in the test to fail i.e. either no sub-shards exist or if they do, they are active and recovered

There are two bugs in play here:
# The async API status of the split shard command is COMPLETED instead of FAILED which leads the test to believe that the sub-shard slice and replicas should exist but they don't.
# By default, our tests still use legacyCloud=true unless set otherwise.

I'll set legacyCloud=false for this test and open another issue to set this to false by default throughout the test suite.

> Investigate ShardSplitTest failures
> -----------------------------------
>
>                 Key: SOLR-12607
>                 URL: https://issues.apache.org/jira/browse/SOLR-12607
>             Project: Solr
>          Issue Type: Task
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>            Reporter: Shalin Shekhar Mangar
>            Assignee: Shalin Shekhar Mangar
>            Priority: Major
>             Fix For: master (8.0), 7.5
>
>
> There have been many recent ShardSplitTest failures. 
> According to http://fucit.org/solr-jenkins-reports/failure-report.html
> {code}
> Class: org.apache.solr.cloud.api.collections.ShardSplitTest
> Method: testSplitWithChaosMonkey
> Failures: 72.32% (81 / 112)
> Class: org.apache.solr.cloud.api.collections.ShardSplitTest
> Method: test
> Failures: 26.79% (30 / 112)
> {code} 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org