You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Gregory Chanan (JIRA)" <ji...@apache.org> on 2014/02/13 01:13:19 UTC

[jira] [Updated] (SOLR-5721) ConnectionManager can become stuck in likeExpired

     [ https://issues.apache.org/jira/browse/SOLR-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gregory Chanan updated SOLR-5721:
---------------------------------

    Attachment: SOLR-5721test.patch

Here's a rough demonstration of a failure with strategic pauses inserted.  I don't think we'd want such a test, but I couldn't reproduce the bug with a nicer test.  What I did in the nicer test is:
- disconnect
- try to reconnect around the time the disconnect thread would run (with some random error)

if I inserted something small in the disconnect thread, like a System.err, the nicer test would reproduce the issue.  I can attach that if people are interested.

> ConnectionManager can become stuck in likeExpired
> -------------------------------------------------
>
>                 Key: SOLR-5721
>                 URL: https://issues.apache.org/jira/browse/SOLR-5721
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 5.0, 4.7
>            Reporter: Gregory Chanan
>         Attachments: SOLR-5721test.patch
>
>
> Here are the sequence of events:
> - we disconnect
> - The disconnect timer beings to run (so no longer scheduled), but doesn't  set likelyExpired yet
> - We connect, and set likelyExpired = false
> - The disconnect thread runs and sets likelyExpired to true, and it is never set back to false (note that we cancel the disconnect thread but that only cancels scheduled tasks but not running tasks).
> This is pretty difficult to reproduce without doing more work in the disconnect thread.  It's easy to reproduce by adding sleeps in various places -- I have a test that I'll attach that does that.
> The most straightforward way to solve this would be to grab the synchronization lock on ConnectionManager in the disconnect thread, ensure we aren't actually connected, and only then setting likelyExpired to true.  In code:
> {code}
> synchronized (ConnectionManager.this) {
>   if (!connected) likelyExpired = true;
> }
> {code}
> but this is all pretty subtle and error prone.  It's easier to just get rid of the disconnect thread and record the last time we disconnected.  Then, when we check likelyExpired, we just do a quick calculation to see if we are likelyExpired.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org