You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ratis.apache.org by "Hadoop QA (JIRA)" <ji...@apache.org> on 2017/12/03 14:58:01 UTC

[jira] [Commented] (RATIS-163) TestRaftWithHadoopRpc fails becuse hadoop rpc retry logic

    [ https://issues.apache.org/jira/browse/RATIS-163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16275965#comment-16275965 ] 

Hadoop QA commented on RATIS-163:
---------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m  0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m  9s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 48s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 16s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 30s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m  0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  9m 55s{color} | {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m  7s{color} | {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 16m 47s{color} | {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | ratis.hadooprpc.TestRaftExceptionWithHadoopRpc |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/ratis:date2017-12-03 |
| JIRA Issue | RATIS-163 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12900378/RATIS-163.001.patch |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  checkstyle  compile  |
| uname | Linux 348d03dd9f15 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-RATIS-Build/yetus-personality.sh |
| git revision | master / 874e48b |
| Default Java | 1.8.0_151 |
| unit | https://builds.apache.org/job/PreCommit-RATIS-Build/50/artifact/out/patch-unit-root.txt |
|  Test Results | https://builds.apache.org/job/PreCommit-RATIS-Build/50/testReport/ |
| modules | C: ratis-hadoop U: ratis-hadoop |
| Console output | https://builds.apache.org/job/PreCommit-RATIS-Build/50/console |
| Powered by | Apache Yetus 0.5.0   http://yetus.apache.org |


This message was automatically generated.



> TestRaftWithHadoopRpc fails becuse hadoop rpc retry logic
> ---------------------------------------------------------
>
>                 Key: RATIS-163
>                 URL: https://issues.apache.org/jira/browse/RATIS-163
>             Project: Ratis
>          Issue Type: Bug
>            Reporter: Elek, Marton
>            Assignee: Elek, Marton
>         Attachments: RATIS-163.001.patch
>
>
> During the last qbt nightly build TestRaftWithHadoopRpc is failed.
> The problem could be reproduced locally:
> mvn test -Dtest=TestRaftWithHadoopRpc#testBasicLeaderElection
> The key output is at the end of the log file:
> {code}
> 2017-12-03 15:25:00,966 INFO  ipc.Client (Client.java:handleConnectionFailure(940)) - Retrying connect to server: 0.0.0.0/0.0.0.0:46409. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
> 2017-12-03 15:25:00,967 WARN  ipc.Client (Client.java:handleConnectionFailure(922)) - Failed to connect to server: 0.0.0.0/0.0.0.0:46409: retries get failed due to exceeded maximum allowed retries number: 10
> java.net.ConnectException: Connection refused
> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> 	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> 	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
> 	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
> 	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:679)
> 	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:775)
> 	at org.apache.hadoop.ipc.Client$Connection.access$3300(Client.java:410)
> 	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1556)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1387)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1351)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngineShaded$Invoker.invoke(ProtobufRpcEngineShaded.java:214)
> 	at com.sun.proxy.$Proxy13.requestVote(Unknown Source)
> 	at org.apache.ratis.hadooprpc.server.HadoopRpcService.lambda$requestVote$4(HadoopRpcService.java:176)
> 	at org.apache.ratis.hadooprpc.server.HadoopRpcService.processRequest(HadoopRpcService.java:188)
> 	at org.apache.ratis.hadooprpc.server.HadoopRpcService.requestVote(HadoopRpcService.java:175)
> 	at org.apache.ratis.server.impl.LeaderElection.lambda$submitRequests$0(LeaderElection.java:189)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> {code}
> In this test case the unit test just kills all the leaders one by one. If one leader is killed the other follower still tries to connect to them. At every voterequest the running nodes will (try to) send a message to the killed nodes.
> But there is a retry logic in Hadoop RPC by default. So the LeaderElection.submitRequest/requestVote method (which is executed in a spereated executor) won't be finished even if the LeaderElection is stopped. The requestVote task should be finised quite fast by default, but in this case hadop rpc just tries to reconnect again and again, so the internal executor of the LeaderElection will work even if the LeaderElection itself is stopped.
> The easiest way to solve this to disable hadoop ipc retry. I suggest this (at least for now), as the current test failure is not a real test case failure, just the junit test framework can't finish the test method as there are still ongoing hadoop rpc clients.
> The tricky solution would be to try to stop existing hadoop client request in case of the LeaderElection shutdown.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)