You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Mark Gui (Jira)" <ji...@apache.org> on 2022/04/20 07:58:00 UTC
[jira] [Updated] (HDDS-6614) EC: Fix Datanode block file INCONSISTENCY during heavy load.

     [ https://issues.apache.org/jira/browse/HDDS-6614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Gui updated HDDS-6614:
---------------------------
    Description: 
We are stressing an Ozone cluster with heavy EC writes and reads, and we get one and only one exception from one DN:
{code:java}
2022-04-14 15:02:04,422 [ChunkReader-46] INFO org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler: Operation: WriteChunk , Trace ID:  , Message: Chunk file offset 1048576 does not match blockFile length 0 , Result: CHUNK_FILE_INCONSISTENCY , StorageContainerException Occurred.
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: Chunk file offset 1048576 does not match blockFile length 0
        at org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.validateChunkSize(ChunkUtils.java:391)
        at org.apache.hadoop.ozone.container.keyvalue.impl.FilePerBlockStrategy.writeChunk(FilePerBlockStrategy.java:140)
        at org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerDispatcher.writeChunk(ChunkManagerDispatcher.java:73)
        at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleWriteChunk(KeyValueHandler.java:705)
        at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.dispatchRequest(KeyValueHandler.java:224)
        at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:188)
        at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:307)
        at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.lambda$dispatch$14(HddsDispatcher.java:169)
        at org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)
        at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:168)
        at org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:57)
        at org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:50)
        at org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:255)
        at org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33)
        at org.apache.hadoop.hdds.tracing.GrpcServerInterceptor$1.onMessage(GrpcServerInterceptor.java:49)
        at org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailableInternal(ServerCallImpl.java:309)
        at org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:292)
        at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:782)
        at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
        at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
2022-04-14 15:02:04,441 [ChunkReader-46] WARN org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer: Moving container /data20/ozone/hdds/CID-ea6bec06-3f65-4721-a884-b3b9293f6a55/current/containerDir1/614 to state UNHEALTHY from state:UNHEALTHY Trace:java.lang.Thread.getStackTrace(Thread.java:1559) {code}
And this results in an UNHEALTHY container replica:
{code:java}
[ozoneadmin@TENCENT64 ~/ozone-1.3.0-SNAPSHOT]$ ./bin/ozone admin container info 614
Container id: 614
Pipeline id: bddd5494-af98-46b2-a4c3-6757f129d3d2
Container State: CLOSED
Datanodes: [...]
Replicas: [State: CLOSED; ReplicaIndex: 7; Origin: d4a6d2fb-3cc3-4e31-a9a6-7c3cda56c7ad; Location: d4a6d2fb-3cc3-4e31-a9a6-7c3cda56c7ad/9.27.141.78,
State: CLOSED; ReplicaIndex: 10; Origin: 45784b6a-06d7-4d76-b53e-1bc10a7e22ea; Location: 45784b6a-06d7-4d76-b53e-1bc10a7e22ea/9.27.142.97,
State: CLOSED; ReplicaIndex: 9; Origin: 65024339-4831-43c0-9fc0-752f8999d9c4; Location: 65024339-4831-43c0-9fc0-752f8999d9c4/9.27.140.135,
State: CLOSED; ReplicaIndex: 8; Origin: e963db9e-44ea-425c-92e3-bdb5305c3049; Location: e963db9e-44ea-425c-92e3-bdb5305c3049/9.27.142.51,
State: CLOSED; ReplicaIndex: 6; Origin: b6115b17-827c-484f-b61a-5f5849c5dbdb; Location: b6115b17-827c-484f-b61a-5f5849c5dbdb/9.27.141.121,
State: UNHEALTHY; ReplicaIndex: 12; Origin: ae31f944-b6c9-4e18-b677-0b0edb2a8e13; Location: ae31f944-b6c9-4e18-b677-0b0edb2a8e13/9.27.141.78,
State: CLOSED; ReplicaIndex: 2; Origin: 4a6af5ae-097a-4daa-9390-b0fbe87446f1; Location: 4a6af5ae-097a-4daa-9390-b0fbe87446f1/9.27.144.159,
State: CLOSED; ReplicaIndex: 3; Origin: 3c3bc79e-3b5f-4f73-b502-7778a7993db6; Location: 3c3bc79e-3b5f-4f73-b502-7778a7993db6/9.27.140.38,
State: CLOSED; ReplicaIndex: 14; Origin: 37db964c-0663-4b7c-915c-eaffccc32f55; Location: 37db964c-0663-4b7c-915c-eaffccc32f55/9.27.142.51,
State: CLOSED; ReplicaIndex: 11; Origin: dc262c7c-184e-4f66-af98-2ece21deeab1; Location: dc262c7c-184e-4f66-af98-2ece21deeab1/9.27.140.222,
State: CLOSED; ReplicaIndex: 13; Origin: e7ea47c7-6d33-490a-9d3a-3c8231c81e30; Location: e7ea47c7-6d33-490a-9d3a-3c8231c81e30/9.27.144.169,
State: CLOSED; ReplicaIndex: 1; Origin: 8333d554-2e39-4ef6-b73a-01539e82a9d1; Location: 8333d554-2e39-4ef6-b73a-01539e82a9d1/9.27.144.226,
State: CLOSED; ReplicaIndex: 4; Origin: 674aa6fb-4f27-4358-b7a7-af2bbf2683ae; Location: 674aa6fb-4f27-4358-b7a7-af2bbf2683ae/9.27.141.230,
State: CLOSED; ReplicaIndex: 5; Origin: 0d989a0e-3bb4-49ea-bacc-d468729bd9fa; Location: 0d989a0e-3bb4-49ea-bacc-d468729bd9fa/9.27.140.135] {code}
From the log, we could see that a block file received a write request at offset 1048576 while the block file is of length 0.

This should be considered insane because EC client write stripes one by one, it doesn't send the next stripe chunks to DNs before we get success responses from DNs for the previous stripe.

So I suspect that the problem lies in the check of the DN itself.
{code:java}
public static void validateChunkSize(File chunkFile, ChunkInfo chunkInfo)
    throws StorageContainerException {
  long offset = chunkInfo.getOffset();
  long len = chunkFile.length();
  if (chunkFile.length() != offset) {
    throw new StorageContainerException(
        "Chunk file offset " + offset + " does not match blockFile length " +
        len, CHUNK_FILE_INCONSISTENCY);
  }
} {code}
Here we get the length of the existing block file by using `chunkFile.length()` which is suspicious since we do write to block files by using FileChannels, so I think we should get the file size by the channel too because the channel may haven't create the block file yet when we do `chunkFile.length()`, thus we may get a 0. This may be the cause of this problem.

It is hard to reproduce this problem with simple tests, it only reveals on heavy loads, but I did a small experiment to show the different between `fileChannel.size()` vs `file.length()`, the small program could be seen in the attachments.

 

> EC: Fix Datanode block file INCONSISTENCY during heavy load.
> ------------------------------------------------------------
>
>                 Key: HDDS-6614
>                 URL: https://issues.apache.org/jira/browse/HDDS-6614
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Mark Gui
>            Assignee: Mark Gui
>            Priority: Major
>
> We are stressing an Ozone cluster with heavy EC writes and reads, and we get one and only one exception from one DN:
> {code:java}
> 2022-04-14 15:02:04,422 [ChunkReader-46] INFO org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler: Operation: WriteChunk , Trace ID:  , Message: Chunk file offset 1048576 does not match blockFile length 0 , Result: CHUNK_FILE_INCONSISTENCY , StorageContainerException Occurred.
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: Chunk file offset 1048576 does not match blockFile length 0
>         at org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.validateChunkSize(ChunkUtils.java:391)
>         at org.apache.hadoop.ozone.container.keyvalue.impl.FilePerBlockStrategy.writeChunk(FilePerBlockStrategy.java:140)
>         at org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerDispatcher.writeChunk(ChunkManagerDispatcher.java:73)
>         at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleWriteChunk(KeyValueHandler.java:705)
>         at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.dispatchRequest(KeyValueHandler.java:224)
>         at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:188)
>         at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:307)
>         at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.lambda$dispatch$14(HddsDispatcher.java:169)
>         at org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)
>         at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:168)
>         at org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:57)
>         at org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:50)
>         at org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:255)
>         at org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33)
>         at org.apache.hadoop.hdds.tracing.GrpcServerInterceptor$1.onMessage(GrpcServerInterceptor.java:49)
>         at org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailableInternal(ServerCallImpl.java:309)
>         at org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:292)
>         at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:782)
>         at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>         at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> 2022-04-14 15:02:04,441 [ChunkReader-46] WARN org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer: Moving container /data20/ozone/hdds/CID-ea6bec06-3f65-4721-a884-b3b9293f6a55/current/containerDir1/614 to state UNHEALTHY from state:UNHEALTHY Trace:java.lang.Thread.getStackTrace(Thread.java:1559) {code}
> And this results in an UNHEALTHY container replica:
> {code:java}
> [ozoneadmin@TENCENT64 ~/ozone-1.3.0-SNAPSHOT]$ ./bin/ozone admin container info 614
> Container id: 614
> Pipeline id: bddd5494-af98-46b2-a4c3-6757f129d3d2
> Container State: CLOSED
> Datanodes: [...]
> Replicas: [State: CLOSED; ReplicaIndex: 7; Origin: d4a6d2fb-3cc3-4e31-a9a6-7c3cda56c7ad; Location: d4a6d2fb-3cc3-4e31-a9a6-7c3cda56c7ad/9.27.141.78,
> State: CLOSED; ReplicaIndex: 10; Origin: 45784b6a-06d7-4d76-b53e-1bc10a7e22ea; Location: 45784b6a-06d7-4d76-b53e-1bc10a7e22ea/9.27.142.97,
> State: CLOSED; ReplicaIndex: 9; Origin: 65024339-4831-43c0-9fc0-752f8999d9c4; Location: 65024339-4831-43c0-9fc0-752f8999d9c4/9.27.140.135,
> State: CLOSED; ReplicaIndex: 8; Origin: e963db9e-44ea-425c-92e3-bdb5305c3049; Location: e963db9e-44ea-425c-92e3-bdb5305c3049/9.27.142.51,
> State: CLOSED; ReplicaIndex: 6; Origin: b6115b17-827c-484f-b61a-5f5849c5dbdb; Location: b6115b17-827c-484f-b61a-5f5849c5dbdb/9.27.141.121,
> State: UNHEALTHY; ReplicaIndex: 12; Origin: ae31f944-b6c9-4e18-b677-0b0edb2a8e13; Location: ae31f944-b6c9-4e18-b677-0b0edb2a8e13/9.27.141.78,
> State: CLOSED; ReplicaIndex: 2; Origin: 4a6af5ae-097a-4daa-9390-b0fbe87446f1; Location: 4a6af5ae-097a-4daa-9390-b0fbe87446f1/9.27.144.159,
> State: CLOSED; ReplicaIndex: 3; Origin: 3c3bc79e-3b5f-4f73-b502-7778a7993db6; Location: 3c3bc79e-3b5f-4f73-b502-7778a7993db6/9.27.140.38,
> State: CLOSED; ReplicaIndex: 14; Origin: 37db964c-0663-4b7c-915c-eaffccc32f55; Location: 37db964c-0663-4b7c-915c-eaffccc32f55/9.27.142.51,
> State: CLOSED; ReplicaIndex: 11; Origin: dc262c7c-184e-4f66-af98-2ece21deeab1; Location: dc262c7c-184e-4f66-af98-2ece21deeab1/9.27.140.222,
> State: CLOSED; ReplicaIndex: 13; Origin: e7ea47c7-6d33-490a-9d3a-3c8231c81e30; Location: e7ea47c7-6d33-490a-9d3a-3c8231c81e30/9.27.144.169,
> State: CLOSED; ReplicaIndex: 1; Origin: 8333d554-2e39-4ef6-b73a-01539e82a9d1; Location: 8333d554-2e39-4ef6-b73a-01539e82a9d1/9.27.144.226,
> State: CLOSED; ReplicaIndex: 4; Origin: 674aa6fb-4f27-4358-b7a7-af2bbf2683ae; Location: 674aa6fb-4f27-4358-b7a7-af2bbf2683ae/9.27.141.230,
> State: CLOSED; ReplicaIndex: 5; Origin: 0d989a0e-3bb4-49ea-bacc-d468729bd9fa; Location: 0d989a0e-3bb4-49ea-bacc-d468729bd9fa/9.27.140.135] {code}
> From the log, we could see that a block file received a write request at offset 1048576 while the block file is of length 0.
> This should be considered insane because EC client write stripes one by one, it doesn't send the next stripe chunks to DNs before we get success responses from DNs for the previous stripe.
> So I suspect that the problem lies in the check of the DN itself.
> {code:java}
> public static void validateChunkSize(File chunkFile, ChunkInfo chunkInfo)
>     throws StorageContainerException {
>   long offset = chunkInfo.getOffset();
>   long len = chunkFile.length();
>   if (chunkFile.length() != offset) {
>     throw new StorageContainerException(
>         "Chunk file offset " + offset + " does not match blockFile length " +
>         len, CHUNK_FILE_INCONSISTENCY);
>   }
> } {code}
> Here we get the length of the existing block file by using `chunkFile.length()` which is suspicious since we do write to block files by using FileChannels, so I think we should get the file size by the channel too because the channel may haven't create the block file yet when we do `chunkFile.length()`, thus we may get a 0. This may be the cause of this problem.
> It is hard to reproduce this problem with simple tests, it only reveals on heavy loads, but I did a small experiment to show the different between `fileChannel.size()` vs `file.length()`, the small program could be seen in the attachments.
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org