You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Yonik Seeley (Created) (JIRA)" <ji...@apache.org> on 2012/02/29 15:27:57 UTC

[jira] [Created] (SOLR-3180) ChaosMonkey test failures

ChaosMonkey test failures
-------------------------

                 Key: SOLR-3180
                 URL: https://issues.apache.org/jira/browse/SOLR-3180
             Project: Solr
          Issue Type: Bug
          Components: SolrCloud
            Reporter: Yonik Seeley


Handle intermittent failures in the ChaosMonkey tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3180) ChaosMonkey test failures

Posted by "Yonik Seeley (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221712#comment-13221712 ] 

Yonik Seeley commented on SOLR-3180:
------------------------------------

Failures are *much* less frequent, but I still got one after about 7 hours I think.
I saw a commit fail (due to the interrupted exception), but then I later saw the IW.close() succeed (which caused Solr to cap the log file, assuming that everything was in the index).

As a result, I just committed a change to the shutdown code to do an explicit commit.
                
> ChaosMonkey test failures
> -------------------------
>
>                 Key: SOLR-3180
>                 URL: https://issues.apache.org/jira/browse/SOLR-3180
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>            Reporter: Yonik Seeley
>
> Handle intermittent failures in the ChaosMonkey tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3180) ChaosMonkey test failures

Posted by "Yonik Seeley (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221300#comment-13221300 ] 

Yonik Seeley commented on SOLR-3180:
------------------------------------

Just checked in a fix for this as well as a test that recovers from more than one tlog at startup. 
                
> ChaosMonkey test failures
> -------------------------
>
>                 Key: SOLR-3180
>                 URL: https://issues.apache.org/jira/browse/SOLR-3180
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>            Reporter: Yonik Seeley
>
> Handle intermittent failures in the ChaosMonkey tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3180) ChaosMonkey test failures

Posted by "Yonik Seeley (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13219246#comment-13219246 ] 

Yonik Seeley commented on SOLR-3180:
------------------------------------

We've come a long way and the monkey has uncovered a number of bugs that we've fixed and is helping to make a really solid solution.

I just uncovered another one having to do with races on shutdown.
When we kill the Jetty instance, it can cause an interrupted exception that closes the underlying NIO files under lucene.
If a commit is happening concurrently then what can happen is that we can end up with more than one unfinished transaction log.

We call preCommit, which move the current tlog to prevTlog.
The commit fails, but concurrently other updates are coming in and they cause a new tlog to be created.
Even other updates coming in after this point can also succeed since they are simply buffered in memory by the IW. 
                
> ChaosMonkey test failures
> -------------------------
>
>                 Key: SOLR-3180
>                 URL: https://issues.apache.org/jira/browse/SOLR-3180
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>            Reporter: Yonik Seeley
>
> Handle intermittent failures in the ChaosMonkey tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org