You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Chris Nauroth (JIRA)" <ji...@apache.org> on 2012/10/25 00:58:12 UTC

[jira] [Created] (HADOOP-8980) TestRPC fails on Windows

Chris Nauroth created HADOOP-8980:
-------------------------------------

             Summary: TestRPC fails on Windows
                 Key: HADOOP-8980
                 URL: https://issues.apache.org/jira/browse/HADOOP-8980
             Project: Hadoop Common
          Issue Type: Bug
          Components: ipc
    Affects Versions: trunk-win
            Reporter: Chris Nauroth
            Assignee: Chris Nauroth


This failure may indicate a difference in socket handling on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8980) TestRPC fails on Windows

Posted by "Chris Nauroth (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488970#comment-13488970 ] 

Chris Nauroth commented on HADOOP-8980:
---------------------------------------

{{TestRPC.testErrorMsgForInsecureClient}} fails due to a race condition between {{RPC.Server.start}} initialization of its internal {{Listener}} and {{Reader}} threads and the test code calling {{countThreads}} to look for {{Reader}} stack frames as evidence that it's actually running.  Inserting a sleep call made the test pass, though of course this would be a non-deterministic solution.  Question for anyone in the community: is there any expectation that {{org.apache.hadoop.ipc.RPC.Server.start}} is synchronous, by which I mean that immediately after calling {{Server.start}}, the caller expects that startup has completed and the server is ready to receive requests?  If yes, then we need to fix {{Server.start}} to block until full initialization.  If no, then we need to fix the test to poll for completion of startup instead of immediately asserting that it's started.

For {{TestRPC.testErrorMsgForInsecureClient}}, my best theory right now is that there is a race condition between {{Connection.readAndProcess}} calling {{responder.doRespond(authFailedCall)}} and {{Reader.doRead}} closing the connection, thus causing the {{IOException}} on the client side shown above when the client tries to read.

                
> TestRPC fails on Windows
> ------------------------
>
>                 Key: HADOOP-8980
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8980
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: trunk-win
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>
> This failure may indicate a difference in socket handling on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8980) TestRPC fails on Windows

Posted by "Chris Nauroth (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509416#comment-13509416 ] 

Chris Nauroth commented on HADOOP-8980:
---------------------------------------

Hi, Xuan.  I took another look at the {{TestRPC#testErrorMsgForInsecureClient}} failure, and I still think it's a race condition on the server side.

Specifically, {{Server.Connection#readAndProcess}} calls {{Server.Connection#initializeAuthContext}} to check authentication, and if it fails, sets up the "authentication is not enabled" response, and enqueues the response by calling {{responder.doRespond(authFailedCall)}}.  The responder runs a separate thread that loops, dequeues responses, and writes them in {{Server.Responder#processResponse}}.  Meanwhile, back in the thread of {{Server.Connection#readAndProcess}}, the authentication failure also causes it to throw an IOException.  This propagates up to {{Server.Reader#doRead}}, which closes the connection.  If we are unlucky enough to have the connection get closed before the responder thread gets a chance to write the response, then the client doesn't receive the expected response message, and instead we get this exception about connection abort.  It appears that Windows consistently schedules threads just right to expose this problem.

It's possible that your experiment to insert a Thread.sleep in the client-side code interfered with the thread scheduling in such a way that it masked the problem and made the test pass.  It's all running on the same machine, in the same process.

In order to validate my theory that it's a server-side race condition, I came up with an experiment that doesn't involve inserting sleep calls that might interfere with timing.  In {{Server.Reader#doRead}}, I commented out the {{closeConnection(c)}} call.  The test consistently passed when I did this, so I think that validates the theory that it's a server-side problem, and that one side of the race condition is the connection close.

This might indicate that we need to change the {{Server}} code to send the "authentication is not enabled" response synchronously, bypassing the {{Responder}} queue, or finding some other way to chain the connection close after the response is handled normally from the queue.
                
> TestRPC fails on Windows
> ------------------------
>
>                 Key: HADOOP-8980
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8980
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: trunk-win
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>
> This failure may indicate a difference in socket handling on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8980) TestRPC fails on Windows

Posted by "Xuan Gong (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507735#comment-13507735 ] 

Xuan Gong commented on HADOOP-8980:
-----------------------------------

I agree with you. The test failure, i think, is because the race condition. But I think the race condition happens when the client try to set up the I/O stream with the server. It will send the headers to the server and start the connection thread that waits for response. The code is at Client.java, function setupIOstreams(), if we add the thread.sleep(200L) at the end of code (between start() and return), we can make the test pass. So, I think, the test failure because it tried to write before the I/O stream betweem server and client is set up.
                
> TestRPC fails on Windows
> ------------------------
>
>                 Key: HADOOP-8980
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8980
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: trunk-win
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>
> This failure may indicate a difference in socket handling on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8980) TestRPC fails on Windows

Posted by "Chris Nauroth (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483698#comment-13483698 ] 

Chris Nauroth commented on HADOOP-8980:
---------------------------------------

{code}
Running org.apache.hadoop.ipc.TestRPC
Tests run: 14, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 28.181 sec <<< FAILURE!
testErrorMsgForInsecureClient(org.apache.hadoop.ipc.TestRPC)  Time elapsed: 16 sec  <<< ERROR!
java.io.IOException: Failed on local exception: java.io.IOException: An established connection was aborted by the software in your host machine; Host Details : local host is: "WIN-NCDLEQLC13J/10.0.2.15"; destination host is: "WIN-NCDLEQLC13J":59425; 
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:759)
	at org.apache.hadoop.ipc.Client.call(Client.java:1165)
	at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231)
	at $Proxy8.echo(Unknown Source)
	at org.apache.hadoop.ipc.TestRPC.testErrorMsgForInsecureClient(TestRPC.java:690)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
	at org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
	at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
	at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
	at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
	at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
	at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)
Caused by: java.io.IOException: An established connection was aborted by the software in your host machine
	at sun.nio.ch.SocketDispatcher.read0(Native Method)
	at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:25)
	at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
	at sun.nio.ch.IOUtil.read(IOUtil.java:171)
	at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
	at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
	at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:159)
	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:129)
	at java.io.FilterInputStream.read(FilterInputStream.java:116)
	at java.io.FilterInputStream.read(FilterInputStream.java:116)
	at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:388)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
	at java.io.FilterInputStream.read(FilterInputStream.java:66)
	at com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:276)
	at com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:760)
	at com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:288)
	at com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:752)
	at org.apache.hadoop.ipc.protobuf.RpcPayloadHeaderProtos$RpcResponseHeaderProto.parseDelimitedFrom(RpcPayloadHeaderProtos.java:985)
	at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:883)
	at org.apache.hadoop.ipc.Client$Connection.run(Client.java:814)
{code}

                
> TestRPC fails on Windows
> ------------------------
>
>                 Key: HADOOP-8980
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8980
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: trunk-win
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>
> This failure may indicate a difference in socket handling on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8980) TestRPC fails on Windows

Posted by "Chris Nauroth (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509356#comment-13509356 ] 

Chris Nauroth commented on HADOOP-8980:
---------------------------------------

Correction to my comment on 11/1: the first paragraph is discussing the failure in {{TestRPC#testStopsAllThreads}}.
                
> TestRPC fails on Windows
> ------------------------
>
>                 Key: HADOOP-8980
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8980
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: trunk-win
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>
> This failure may indicate a difference in socket handling on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8980) TestRPC fails on Windows

Posted by "Xuan Gong (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509371#comment-13509371 ] 

Xuan Gong commented on HADOOP-8980:
-----------------------------------

My comment is about the test error on TestRPC#testErrorMsgForInsecureClient
                
> TestRPC fails on Windows
> ------------------------
>
>                 Key: HADOOP-8980
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8980
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: trunk-win
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>
> This failure may indicate a difference in socket handling on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira