You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Attila Doroszlai (Jira)" <ji...@apache.org> on 2023/09/15 09:58:00 UTC

[jira] [Commented] (HDDS-9292) Intermittent failure in S3 multipart copy with HA proxy

    [ https://issues.apache.org/jira/browse/HDDS-9292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17765537#comment-17765537 ] 

Attila Doroszlai commented on HDDS-9292:
----------------------------------------

Hi [~ashishk], this started happening on 2023-08-30, right after HDDS-9078 was merged.  If netty temp dir is used, there may be a race condition: {{OZONE_HOME/temp}} is shared between all datanodes in docker compose environment, because {{OZONE_HOME}} is bind mounted from the local host.  This is just a hunch and I may be totally wrong, though.  Any commit (or even github environment change) before that date may be causing this.  Could you please try to debug?

> Intermittent failure in S3 multipart copy with HA proxy
> -------------------------------------------------------
>
>                 Key: HDDS-9292
>                 URL: https://issues.apache.org/jira/browse/HDDS-9292
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Datanode, S3
>            Reporter: Attila Doroszlai
>            Assignee: Ashish Kumar
>            Priority: Major
>
> Download of object created via multipart upload in S3 HA environment is failing intermittently.
> {code}
> Executing test ozone/test-s3-haproxy.sh
> ...
> Test Multipart Upload with the simplified aws s3 cp API               | FAIL |
> ...
> ERROR: Test execution of ozone/test-s3-haproxy.sh is FAILED!!!!
> {code}
> S3 Gateway log:
> {code}
> 2023-09-15 05:06:57,143 [qtp2112233878-18] ERROR scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline Pipeline[ Id: 3e70c06d-5798-4025-8791-78b8b2500f11, Nodes: 9416081d-f44a-4263-bc28-1838037e3ca1(ozone_datanode_1.ozone_default/172.19.0.7)82a30eb8-2baa-482f-a23a-ad010ae2684d(ozone_datanode_3.ozone_default/172.19.0.10)c229b2ba-c364-4c38-879e-952c8da62282(ozone_datanode_2.ozone_default/172.19.0.3), ReplicationConfig: STANDALONE/THREE, State:OPEN, leaderId:c229b2ba-c364-4c38-879e-952c8da62282, CreationTimestamp2023-09-15T05:05:12.106Z[UTC]].
> 2023-09-15 05:06:57,144 [qtp2112233878-18] WARN storage.ContainerProtocolCalls: Failed to get block #111677748019200096 in container #1 from 9416081d-f44a-4263-bc28-1838037e3ca1(ozone_datanode_1.ozone_default/172.19.0.7); will try another datanode.
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: bcsId 388 mismatches with existing block Id 387 for block conID: 1 locID: 111677748019200096 bcsId: 388.
> 	at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:674)
> 	at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambda$createValidators$4(ContainerProtocolCalls.java:685)
> 	at org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:407)
> 	at org.apache.hadoop.hdds.scm.XceiverClientGrpc.lambda$sendCommandWithTraceIDAndRetry$0(XceiverClientGrpc.java:347)
> 	at org.apache.hadoop.hdds.tracing.TracingUtil.executeInSpan(TracingUtil.java:169)
> 	at org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:149)
> 	at org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:342)
> 	at org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:323)
> 	at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:208)
> 	at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambda$getBlock$0(ContainerProtocolCalls.java:186)
> 	at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.tryEachDatanode(ContainerProtocolCalls.java:146)
> 	at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:185)
> 	at org.apache.hadoop.hdds.scm.storage.BlockInputStream.getChunkInfos(BlockInputStream.java:255)
> 	at org.apache.hadoop.hdds.scm.storage.BlockInputStream.initialize(BlockInputStream.java:146)
> 	at org.apache.hadoop.hdds.scm.storage.BlockInputStream.readWithStrategy(BlockInputStream.java:308)
> 	at org.apache.hadoop.hdds.scm.storage.ExtendedInputStream.read(ExtendedInputStream.java:56)
> 	at org.apache.hadoop.hdds.scm.storage.ByteArrayReader.readFromBlock(ByteArrayReader.java:57)
> 	at org.apache.hadoop.hdds.scm.storage.MultipartInputStream.readWithStrategy(MultipartInputStream.java:96)
> 	at org.apache.hadoop.hdds.scm.storage.ExtendedInputStream.read(ExtendedInputStream.java:56)
> 	at org.apache.hadoop.ozone.client.io.OzoneInputStream.read(OzoneInputStream.java:56)
> 	at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1384)
> 	at org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.lambda$get$1(ObjectEndpoint.java:407)
> {code}
> * https://github.com/adoroszlai/ozone-build-results/blob/master/2023/08/30/25061/acceptance-misc
> * https://github.com/adoroszlai/ozone-build-results/blob/master/2023/09/12/25292/acceptance-misc
> * https://github.com/adoroszlai/ozone-build-results/blob/master/2023/09/13/25337/acceptance-misc
> * https://github.com/adoroszlai/ozone-build-results/blob/master/2023/09/13/25345/acceptance-misc
> * https://github.com/adoroszlai/ozone-build-results/blob/master/2023/09/14/25355/acceptance-misc
> * https://github.com/adoroszlai/ozone-build-results/blob/master/2023/09/15/25394/acceptance-misc



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org