You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "Alex Deparvu (Jira)" <ji...@apache.org> on 2023/04/24 23:17:00 UTC

[jira] [Commented] (SOLR-7609) ShardSplitTest NPE

    [ https://issues.apache.org/jira/browse/SOLR-7609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17716031#comment-17716031 ] 

Alex Deparvu commented on SOLR-7609:
------------------------------------

Updating with the GitHub data as PR is close to being merged, for future reference

Changes done:
 * added the version check on additions to fail in case we are not leader and version = 0. (to match delete flows)
 * changed error status from BAD_REQUEST to INVALID_STATE to allow for retries. I was able to verify retries are happening [0]
 * removed a 'cmd' variable - this is just minor readability refactoring, I tried to avoid changing the code as much as possible
 * updated the ShardSplitTest to keep track of exceptions happening during the concurrent adds and deletes and fail if needed.
 * fixed wrong NPE check on [DistributedZkUpdateProcessor#getCollectionUrls|https://github.com/apache/solr/blob/db4cb66271f615da6a0a3ae6fed5fb2e184fd053/solr/core/src/java/org/apache/solr/update/processor/DistributedZkUpdateProcessor.java#L889]

 Things to followup later:
 * there is still one failure happening `Request says it is coming from parent shard leader but we are in active state`
 * noticed the setupRequest() method is usually called twice, I think this is easy to fix with a basic flag, I can add it if it doesn't grow the PR too much, or it can be done on a followup PR.
 * all over the class there is a pattern of checking read only status to prevent some operations I believe could be broken.
{code:java}
clusterState = zkController.getClusterState();
if (isReadOnly()) {
  throw new SolrException(ErrorCode.FORBIDDEN, "Collection " + collection + " is read-only.");
}
{code}
refreshing the clusterState is insufficient, because the isReadOnly is based on the readOnlyCollection flag that is only initialized at the beginning. if the intent was to have a fresh check, the readOnlyCollection flag needs to be updated too, based on the new clusterState

> ShardSplitTest NPE
> ------------------
>
>                 Key: SOLR-7609
>                 URL: https://issues.apache.org/jira/browse/SOLR-7609
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Steven Rowe
>            Priority: Minor
>         Attachments: ShardSplitTest.NPE.log
>
>          Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> I'm guessing this is a test bug, but the seed doesn't reproduce for me (tried on the same Linux machine it occurred on and on OS X):
> {noformat}
>    [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=ShardSplitTest -Dtests.method=test -Dtests.seed=9318DDA46578ECF9 -Dtests.slow=true -Dtests.locale=is -Dtests.timezone=America/St_Vincent -Dtests.asserts=true -Dtests.file.encoding=US-ASCII
>    [junit4] ERROR   55.8s J6  | ShardSplitTest.test <<<
>    [junit4]    > Throwable #1: java.lang.NullPointerException
>    [junit4]    > 	at __randomizedtesting.SeedInfo.seed([9318DDA46578ECF9:1B4CE27ECB848101]:0)
>    [junit4]    > 	at org.apache.solr.cloud.ShardSplitTest.logDebugHelp(ShardSplitTest.java:547)
>    [junit4]    > 	at org.apache.solr.cloud.ShardSplitTest.checkDocCountsAndShardStates(ShardSplitTest.java:438)
>    [junit4]    > 	at org.apache.solr.cloud.ShardSplitTest.splitByUniqueKeyTest(ShardSplitTest.java:222)
>    [junit4]    > 	at org.apache.solr.cloud.ShardSplitTest.test(ShardSplitTest.java:84)
>    [junit4]    > 	at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:960)
>    [junit4]    > 	at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:935)
>    [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Line 547 of {{ShardSplitTest.java}} is:
> {code:java}
>       idVsVersion.put(document.getFieldValue("id").toString(), document.getFieldValue("_version_").toString());
> {code}
> Skimming the code, it's not obvious what could be null.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org