You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Erick Erickson (JIRA)" <ji...@apache.org> on 2015/10/20 02:46:28 UTC
[jira] [Comment Edited] (SOLR-8135) SolrCloudExampleTest.testLoadDocsIntoGettingStartedCollection reproducible failure

    [ https://issues.apache.org/jira/browse/SOLR-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964325#comment-14964325 ] 

Erick Erickson edited comment on SOLR-8135 at 10/20/15 12:46 AM:
-----------------------------------------------------------------

So far, this patch keeps the test from failing. I hacked this in on the hint that buried in the failure case was a message about "could not reload core blah blah blah" and the fact that if I commented out the update config bits the test succeeded.

Maybe a race condition between multiple API calls modifying the configs and core reloading? This patch forces the collection to reload as part of updating the config...

[~noble.paul] [~markrmiller@gmail.com] [~yonik@apache.org]
This feels like a band-aid though. I see in the ref guide that listeners are registered on the zknode and reloads happen sometime although I'm unclear on what exactly triggers them. It seems like any modification of the config file should be atomic in the sense that as long as the updates are valid, any core reloads should get a config that loads successfully. How do we guarantee atomic reads/writes of the config files? Especially when the read path is different than the write path?

I don't have the logs right now, so I don't have a good sense of what in core reload was really failing, I'll see if I can get some of that info.

NOTE: I'm on a plane so I could only beast this lightly. Nevertheless, I never got 4 successful runs before this patch and got 16 with the patch so it certainly seems in the right neighborhood. I'll be able to give it a more thorough spin when I'm not on battery. Assuming we think this is the right fix.


was (Author: erickerickson):
So far, this patch keeps the test from failing. I hacked this in on the hint that buried in the failure case was a message about "could not reload core blah blah blah" and the fact that if I commented out the update config bits the test succeeded.

Maybe a race condition between multiple API calls modifying the configs and core reloading? This patch forces the collection to reload as part of updating the config...

[~noble.paul] [~markrmiller] [~yonik@apache.org]
This feels like a band-aid though. I see in the ref guide that listeners are registered on the zknode and reloads happen sometime although I'm unclear on what exactly triggers them. It seems like any modification of the config file should be atomic in the sense that as long as the updates are not invalid, any core reloads should get a config that doesn't fail to load. How do we guarantee atomic reads/writes of the config files? Especially when the read path is different than the write path?

I don't have the logs right now, so I don't have a good sense of what in core reload was really failing, I'll see if I can get some of that info.

NOTE: I'm on a plane so I could only beast this lightly. Nevertheless, I never got 4 successful runs before this patch and got 16 with the patch so it certainly seems in the right neighborhood. I'll be able to give it a more thorough spin when I'm not on battery. Assuming we think this is the right fix.

> SolrCloudExampleTest.testLoadDocsIntoGettingStartedCollection reproducible failure
> ----------------------------------------------------------------------------------
>
>                 Key: SOLR-8135
>                 URL: https://issues.apache.org/jira/browse/SOLR-8135
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: Trunk
>            Reporter: Hoss Man
>         Attachments: SOLR-8135.failure.log, SOLR-8135.patch
>
>
> No idea what's going on here, noticed it while testing out an unrelated patch -- seed reproduces against pristine trunk...
> {noformat}
>    [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=SolrCloudExampleTest -Dtests.method=testLoadDocsIntoGettingStartedCollection -Dtests.seed=59EA523FFF6CB60F -Dtests.slow=true -Dtests.locale=es_MX -Dtests.timezone=Africa/Porto-Novo -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
>    [junit4] FAILURE 49.5s | SolrCloudExampleTest.testLoadDocsIntoGettingStartedCollection <<<
>    [junit4]    > Throwable #1: java.lang.AssertionError: Delete action failed!
>    [junit4]    > 	at __randomizedtesting.SeedInfo.seed([59EA523FFF6CB60F:4A896050CE030FA9]:0)
>    [junit4]    > 	at org.apache.solr.cloud.SolrCloudExampleTest.doTestDeleteAction(SolrCloudExampleTest.java:169)
>    [junit4]    > 	at org.apache.solr.cloud.SolrCloudExampleTest.testLoadDocsIntoGettingStartedCollection(SolrCloudExampleTest.java:145)
>    [junit4]    > 	at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:963)
>    [junit4]    > 	at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:938)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org