You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Attila Doroszlai (Jira)" <ji...@apache.org> on 2022/08/25 13:03:00 UTC

[jira] [Commented] (HDDS-6017) Intermittent failure in 'Put object' s3 acceptance HA test

    [ https://issues.apache.org/jira/browse/HDDS-6017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584837#comment-17584837 ] 

Attila Doroszlai commented on HDDS-6017:
----------------------------------------

Most recently seen in [this PR run|https://github.com/apache/ozone/runs/8012670589?check_suite_focus=true] ([log bundle|https://github.com/apache/ozone/suites/7975409623/artifacts/341321331]).

There are timeouts on Ratis pipeline:

{code}
datanode3_1  | 2022-08-25 09:51:49,046 [java.util.concurrent.ThreadPoolExecutor$Worker@b2ee46d[State = -1, empty queue]] WARN server.GrpcLogAppender: fe732011-a882-4c38-a6e6-852829a28d13@group-DAEAFB2236BE->00f70cfa-2742-4097-a66e-29c287bd9fad-GrpcLogAppender:  appendEntries Timeout, request=AppendEntriesRequest:cid=264,entriesCount=1,lastEntry=(t:1, i:1)
...
datanode3_1  | 2022-08-25 09:59:11,254 [java.util.concurrent.ThreadPoolExecutor$Worker@b2ee46d[State = -1, empty queue]] WARN server.GrpcLogAppender: fe732011-a882-4c38-a6e6-852829a28d13@group-DAEAFB2236BE->00f70cfa-2742-4097-a66e-29c287bd9fad-GrpcLogAppender:  appendEntries Timeout, request=AppendEntriesRequest:cid=3633,entriesCount=1,lastEntry=(t:1, i:127)
datanode3_1  | 2022-08-25 10:01:21,199 [null-request--thread4] INFO server.GrpcClientProtocolService: Failed RaftClientRequest:client-4BEC54B18429->fe732011-a882-4c38-a6e6-852829a28d13@group-DAEAFB2236BE, cid=136, seq=0, Watch-ALL_COMMITTED(133), Message:<EMPTY>, reply=RaftClientReply:client-4BEC54B18429->fe732011-a882-4c38-a6e6-852829a28d13@group-DAEAFB2236BE, cid=136, FAILED org.apache.ratis.protocol.exceptions.NotReplicatedException: Request with call Id 136 and log index 133 is not yet replicated to ALL_COMMITTED, logIndex=133, commits[fe732011-a882-4c38-a6e6-852829a28d13:c152, 18a5ef26-3c0e-433b-8ee4-53442ccecad4:c152, 00f70cfa-2742-4097-a66e-29c287bd9fad:c127]
datanode3_1  | 2022-08-25 10:02:22,197 [null-request--thread4] INFO server.GrpcClientProtocolService: Failed RaftClientRequest:client-4A10D7962610->fe732011-a882-4c38-a6e6-852829a28d13@group-DAEAFB2236BE, cid=142, seq=0, Watch-ALL_COMMITTED(142), Message:<EMPTY>, reply=RaftClientReply:client-4A10D7962610->fe732011-a882-4c38-a6e6-852829a28d13@group-DAEAFB2236BE, cid=142, FAILED org.apache.ratis.protocol.exceptions.NotReplicatedException: Request with call Id 142 and log index 142 is not yet replicated to ALL_COMMITTED, logIndex=142, commits[fe732011-a882-4c38-a6e6-852829a28d13:c160, 18a5ef26-3c0e-433b-8ee4-53442ccecad4:c160, 00f70cfa-2742-4097-a66e-29c287bd9fad:c127]
datanode3_1  | 2022-08-25 10:03:23,197 [null-request--thread4] INFO server.GrpcClientProtocolService: Failed RaftClientRequest:client-EC7F05F014C4->fe732011-a882-4c38-a6e6-852829a28d13@group-DAEAFB2236BE, cid=148, seq=0, Watch-ALL_COMMITTED(149), Message:<EMPTY>, reply=RaftClientReply:client-EC7F05F014C4->fe732011-a882-4c38-a6e6-852829a28d13@group-DAEAFB2236BE, cid=148, FAILED org.apache.ratis.protocol.exceptions.NotReplicatedException: Request with call Id 148 and log index 149 is not yet replicated to ALL_COMMITTED, logIndex=149, commits[fe732011-a882-4c38-a6e6-852829a28d13:c164, 18a5ef26-3c0e-433b-8ee4-53442ccecad4:c164, 00f70cfa-2742-4097-a66e-29c287bd9fad:c127]
datanode3_1  | 2022-08-25 10:03:54,030 [org.apache.ratis.util.JvmPauseMonitor$$Lambda$393/0x00000008405e6440@2f3b1db2] WARN util.JvmPauseMonitor: JvmPauseMonitor-fe732011-a882-4c38-a6e6-852829a28d13: Detected pause in JVM or host machine (eg GC): pause of approximately 106211414ns.
datanode3_1  | GC pool 'ParNew' had collection(s): count=1 time=100ms
{code}

{code}
s3g_1        | 2022-08-25 10:01:21,127 [qtp864326906-77] WARN scm.XceiverClientRatis: 3 way commit failed on pipeline Pipeline[ Id: 3f552aed-f873-431e-8b2a-daeafb2236be, Nodes: fe732011-a882-4c38-a6e6-852829a28d13{ip: 172.25.0.104, host: ozonesecure-ha_datanode3_1.ozonesecure-ha_ozone_net, ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], networkLocation: /default-rack, certSerialId: null, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}18a5ef26-3c0e-433b-8ee4-53442ccecad4{ip: 172.25.0.102, host: ozonesecure-ha_datanode1_1.ozonesecure-ha_ozone_net, ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], networkLocation: /default-rack, certSerialId: null, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}00f70cfa-2742-4097-a66e-29c287bd9fad{ip: 172.25.0.103, host: ozonesecure-ha_datanode2_1.ozonesecure-ha_ozone_net, ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], networkLocation: /default-rack, certSerialId: null, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}, ReplicationConfig: RATIS/THREE, State:OPEN, leaderId:fe732011-a882-4c38-a6e6-852829a28d13, CreationTimestamp2022-08-25T09:49:37.829Z[UTC]]
s3g_1        | java.util.concurrent.ExecutionException: org.apache.ratis.protocol.exceptions.TimeoutIOException: Request #136 timeout 180s
s3g_1        | 	at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
s3g_1        | 	at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
s3g_1        | 	at org.apache.hadoop.hdds.scm.XceiverClientRatis.watchForCommit(XceiverClientRatis.java:284)
...
s3g_1        | 	at org.apache.hadoop.ozone.client.io.OzoneOutputStream.close(OzoneOutputStream.java:61)
s3g_1        | 	at org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.createMultipartKey(ObjectEndpoint.java:785)
s3g_1        | 	at org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.put(ObjectEndpoint.java:190)
{code}

> Intermittent failure in 'Put object' s3 acceptance HA test
> ----------------------------------------------------------
>
>                 Key: HDDS-6017
>                 URL: https://issues.apache.org/jira/browse/HDDS-6017
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Ethan Rose
>            Priority: Major
>
> Failing run seen here: [https://github.com/apache/ozone/runs/4169188308?check_suite_focus=true]
> And on the master branch: [https://github.com/apache/ozone/runs/4163572757?check_suite_focus=true]
>  
> {code:java}
> Zero byte file     | FAIL | 
> '
> An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist' does not contain 'InvalidRange' {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org