You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Marton Elek (Jira)" <ji...@apache.org> on 2020/04/15 09:26:00 UTC

[jira] [Commented] (HDDS-3257) Intermittent timeout in integration tests

    [ https://issues.apache.org/jira/browse/HDDS-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17083949#comment-17083949 ] 

Marton Elek commented on HDDS-3257:
-----------------------------------

We analyzed a few with [~shashikant] and [~ljain]. It seems to be an asymmetric communication problem between leader and follower:

Usually the last leader is elected with 1 vote (instead of 2):

{code}
2020-04-07 05:39:52,800 [10efa0c0-b6d1-45a1-855e-ef1ad741a71c@group-9C1A66D474A0-LeaderElection7] INFO  impl.LeaderElection (LeaderElection.java:logAndReturn(61)) - 10efa0c0-b6d1-45a1-855e-ef1ad741a71c@group-9C1A66D474A0-LeaderElection7: Election PASSED; received 1 response(s) [10efa0c0-b6d1-45a1-855e-ef1ad741a71c<-bd7acc78-d58c-493d-9f90-aa900375b793#0:OK-t2] and 0 exception(s); 10efa0c0-b6d1-45a1-855e-ef1ad741a71c@group-9C1A66D474A0:t2, leader=null, voted=10efa0c0-b6d1-45a1-855e-ef1ad741a71c, raftlog=10efa0c0-b6d1-45a1-855e-ef1ad741a71c@group-9C1A66D474A0-SegmentedRaftLog:OPENED:c-1,f-1,i0, conf=-1: [d6a790ea-9667-4c35-b496-e28617be47e4:172.17.0.2:41539, 10efa0c0-b6d1-45a1-855e-ef1ad741a71c:172.17.0.2:45457, bd7acc78-d58c-493d-9f90-aa900375b793:172.17.0.2:36205], old=null
{code}

But the follower which didn't vote, receive the message from the leader:

{code}
2020-04-07 05:39:52,861 [grpc-default-executor-0] INFO  impl.RaftServerImpl (ServerState.java:setLeader(255)) - d6a790ea-9667-4c35-b496-e28617be47e4@group-9C1A66D474A0: change Leader from null to 10efa0c0-b6d1-45a1-855e-ef1ad741a71c at term 2 for appendEntries, leader elected after 10407ms
{code}

And after the 1 minute timeout multiple append log entries are timing out:

{code}
2020-04-07 05:41:05,247 [java.util.concurrent.ThreadPoolExecutor$Worker@228750d3[State = -1, empty queue]] WARN  server.GrpcLogAppender (GrpcLogAppender.java:timeoutAppendRequest(212)) - 10efa0c0-b6d1-45a1-855e-ef1ad741a71c@group-9C
1A66D474A0->d6a790ea-9667-4c35-b496-e28617be47e4-GrpcLogAppender:  appendEntries Timeout, request=AppendEntriesRequest:cid=39,entriesCount=1,lastEntry=(t:2, i:34)
{code}

> Intermittent timeout in integration tests
> -----------------------------------------
>
>                 Key: HDDS-3257
>                 URL: https://issues.apache.org/jira/browse/HDDS-3257
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>            Reporter: Attila Doroszlai
>            Assignee: Shashikant Banerjee
>            Priority: Critical
>         Attachments: org.apache.hadoop.fs.ozone.contract.ITestOzoneContractMkdir-output.txt, org.apache.hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures-output.txt, org.apache.hadoop.ozone.freon.TestOzoneClientKeyGenerator-output.txt, org.apache.hadoop.ozone.freon.TestRandomKeyGenerator-output.txt
>
>
> Even after the changes done in HDDS-3086, some integration tests (especially in it-freon) are intermittently timing out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org