You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Jason Lowe (Updated) (JIRA)" <ji...@apache.org> on 2011/12/07 03:20:40 UTC

[jira] [Updated] (HADOOP-7888) TestFailoverProxy fails intermittently on trunk

     [ https://issues.apache.org/jira/browse/HADOOP-7888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Lowe updated HADOOP-7888:
-------------------------------

    Attachment: hadoop-7888.patch

This patch addresses the race condition by moving the thread synchronization out of FlipFlopProxyProvider and into the method interface being invoked.  All threads will block in the method invocation *before* throwing the exception that triggers the failover.  Therefore the failovers will be concurrent wrt. RetryInvocationHandler.invoke() because both threads will always be in invokeMethod() at the same time.

Speaking of RetryInvocationHandler.invoke(), I also moved the proxyProvider.getProxy() call to occur only when the failover is performed per the previous comment.  It appears this was only moved out of that condition to avoid deadlock when the test thread synchronization was in FlipFlopProxyProvider.getProxy().
                
> TestFailoverProxy fails intermittently on trunk
> -----------------------------------------------
>
>                 Key: HADOOP-7888
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7888
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.24.0
>            Reporter: Jason Lowe
>         Attachments: hadoop-7888.patch
>
>
> TestFailoverProxy can fail intermittently with the failures occurring in testConcurrentMethodFailures().  The test has a race condition where the two threads may be sequentially invoking the unreliable interface rather than concurrently.  Currently the proxy provider's getProxy() method contains the thread synchronization to enforce a concurrent invocation, but examining the source to RetryInvocationHandler.invoke() shows that the call to getProxy() during failover is too late to enforce a truly concurrent invocation.
> For this particular test, one thread could race ahead and block on the CountDownLatch in getProxy() before the other thread even enters RetryInvocationHandler.invoke().  If that happens the second thread will cache the newly updated value for proxyProviderFailoverCount, since the failover has mostly been processed by the original thread.  Therefore the second thread ends up assuming no other thread is present, performs a failover, and the test fails because two failovers occurred instead of one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira