You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Arpit Agarwal (Jira)" <ji...@apache.org> on 2023/02/16 03:07:00 UTC

[jira] [Updated] (HDDS-1206) Handle Datanode volume out of space

     [ https://issues.apache.org/jira/browse/HDDS-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arpit Agarwal updated HDDS-1206:
--------------------------------
    Labels:   (was: Triaged)

> Handle Datanode volume out of space
> -----------------------------------
>
>                 Key: HDDS-1206
>                 URL: https://issues.apache.org/jira/browse/HDDS-1206
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Client
>            Reporter: Nilotpal Nandi
>            Assignee: Hanisha Koneru
>            Priority: Major
>
> steps taken :
> --------------------
>  # create 40 datanode cluster.
>  # one of the datanodes has less than 5 GB space.
>  # Started writing key of size 600MB.
> operation failed:
> Error on the client:
> ----------------------------
> {noformat}
> Fri Mar 1 09:05:28 UTC 2019 Ruuning /root/hadoop_trunk/ozone-0.4.0-SNAPSHOT/bin/ozone sh key put testvol172275910-1551431122-1/testbuck172275910-1551431122-1/test_file24 /root/test_files/test_file24
> original md5sum a6de00c9284708585f5a99b0490b0b23
> 2019-03-01 09:05:39,142 ERROR storage.BlockOutputStream: Unexpected Storage Container Exception:
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 79 creation failed
>  at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:568)
>  at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:535)
>  at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$5(BlockOutputStream.java:613)
>  at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
>  at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>  at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> 2019-03-01 09:05:39,578 ERROR storage.BlockOutputStream: Unexpected Storage Container Exception:
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 79 creation failed
>  at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:568)
>  at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:535)
>  at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$5(BlockOutputStream.java:613)
>  at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
>  at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>  at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> 2019-03-01 09:05:40,368 ERROR storage.BlockOutputStream: Unexpected Storage Container Exception:
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 79 creation failed
>  at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:568)
>  at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:535)
>  at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$5(BlockOutputStream.java:613)
>  at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
>  at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>  at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> 2019-03-01 09:05:40,450 ERROR storage.BlockOutputStream: Unexpected Storage Container Exception:
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 79 creation failed
>  at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:568)
>  at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:535)
>  at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$5(BlockOutputStream.java:613)
>  at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
>  at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>  at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> 2019-03-01 09:05:40,457 ERROR storage.BlockOutputStream: Unexpected Storage Container Exception:
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 79 does not exist
>  at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:568)
>  at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:535)
>  at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$handlePartialFlush$2(BlockOutputStream.java:393)
>  at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
>  at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>  at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> 2019-03-01 09:05:40,535 ERROR storage.BlockOutputStream: Unexpected Storage Container Exception:
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 79 creation failed
>  at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:568)
>  at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:535)
>  at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$5(BlockOutputStream.java:613)
>  at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
>  at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>  at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> 2019-03-01 09:05:40,617 ERROR storage.BlockOutputStream: Unexpected Storage Container Exception:
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 79 creation failed
>  at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:568)
>  at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:535)
>  at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$5(BlockOutputStream.java:613)
>  at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
>  at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>  at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> 2019-03-01 09:05:40,741 ERROR storage.BlockOutputStream: Unexpected Storage Container Exception:
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 79 creation failed
>  at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:568)
>  at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:535)
>  at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$5(BlockOutputStream.java:613)
>  at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
>  at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>  at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> 2019-03-01 09:05:40,814 ERROR storage.BlockOutputStream: Unexpected Storage Container Exception:
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 79 creation failed
>  at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:568)
>  at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:535)
>  at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$5(BlockOutputStream.java:613)
>  at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
>  at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>  at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> 2019-03-01 09:05:40,815 ERROR storage.BlockOutputStream: Unexpected Storage Container Exception:
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 79 does not exist
>  at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:568)
>  at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:535)
>  at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$handlePartialFlush$2(BlockOutputStream.java:393)
>  at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
>  at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>  at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> java.nio.BufferOverflowException
>  at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:189)
>  at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.write(BlockOutputStream.java:213)
>  at org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.write(BlockOutputStreamEntry.java:128)
>  at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:307)
>  at org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:268)
>  at org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49)
>  at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:96)
>  at org.apache.hadoop.ozone.web.ozShell.keys.PutKeyHandler.call(PutKeyHandler.java:111)
>  at org.apache.hadoop.ozone.web.ozShell.keys.PutKeyHandler.call(PutKeyHandler.java:53)
>  at picocli.CommandLine.execute(CommandLine.java:919)
>  at picocli.CommandLine.access$700(CommandLine.java:104)
>  at picocli.CommandLine$RunLast.handle(CommandLine.java:1083)
>  at picocli.CommandLine$RunLast.handle(CommandLine.java:1051)
>  at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:959)
>  at picocli.CommandLine.parseWithHandlers(CommandLine.java:1242)
>  at picocli.CommandLine.parseWithHandler(CommandLine.java:1181)
>  at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:61)
>  at org.apache.hadoop.ozone.web.ozShell.Shell.execute(Shell.java:84)
>  at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:52)
>  at org.apache.hadoop.ozone.web.ozShell.Shell.main(Shell.java:95){noformat}
>  
> ozone.log
> -----------------
>  
> {noformat}
> 2019-03-01 09:05:33,248 [IPC Server handler 17 on 9889] DEBUG (OzoneManagerRequestHandler.java:137) - Received OMRequest: cmdType: CreateKey
> traceID: "5f169cde0a4c8a4e:79f0b64c3329c0ba:5f169cde0a4c8a4e:0"
> clientId: "client-86810A76C95E"
> createKeyRequest {
>  keyArgs {
>  volumeName: "testvol172275910-1551431122-1"
>  bucketName: "testbuck172275910-1551431122-1"
>  keyName: "test_file24"
>  dataSize: 629145600
>  type: RATIS
>  factor: THREE
>  isMultipartKey: false
>  }
> }
> ,
> 2019-03-01 09:05:33,255 [IPC Server handler 17 on 9889] DEBUG (KeyManagerImpl.java:465) - Key test_file24 allocated in volume testvol172275910-1551431122-1 bucket testbuck172275910-1551431122-1
> 2019-03-01 09:05:38,229 [IPC Server handler 8 on 9889] DEBUG (OzoneManagerRequestHandler.java:137) - Received OMRequest: cmdType: AllocateBlock
> traceID: "5f169cde0a4c8a4e:fe6c4bdb75978062:5f169cde0a4c8a4e:0"
> clientId: "client-86810A76C95E"
> allocateBlockRequest {
>  keyArgs {
>  volumeName: "testvol172275910-1551431122-1"
>  bucketName: "testbuck172275910-1551431122-1"
>  keyName: "test_file24"
>  dataSize: 629145600
>  }
>  clientID: 20622763490697872
> }
> ,
> 2019-03-01 09:05:38,739 [grpc-default-executor-17] INFO (ContainerUtils.java:149) - Operation: CreateContainer : Trace ID: 5f169cde0a4c8a4e:f340a80dafdf68eb:5f169cde0a4c8a4e:0 : Message: Container creation failed, due to disk out of space : Result: DISK_OUT_OF_SPACE
> 2019-03-01 09:05:38,790 [grpc-default-executor-17] INFO (ContainerUtils.java:149) - Operation: WriteChunk : Trace ID: 5f169cde0a4c8a4e:f340a80dafdf68eb:5f169cde0a4c8a4e:0 : Message: ContainerID 79 creation failed : Result: DISK_OUT_OF_SPACE
> 2019-03-01 09:05:38,800 [grpc-default-executor-17] DEBUG (ContainerStateMachine.java:358) - writeChunk writeStateMachineData : blockId containerID: 79
> localID: 101674591075108132
> blockCommitSequenceId: 0
>  logIndex 3 chunkName f6508b585fbd0b834b2139939467ac03_stream_8101b9db-a724-4690-abe1-c7daa2630326_chunk_1
> 2019-03-01 09:05:38,801 [grpc-default-executor-17] DEBUG (ContainerStateMachine.java:365) - writeChunk writeStateMachineData completed: blockId containerID: 79
> localID: 101674591075108132
> blockCommitSequenceId: 0
>  logIndex 3 chunkName f6508b585fbd0b834b2139939467ac03_stream_8101b9db-a724-4690-abe1-c7daa2630326_chunk_1
> 2019-03-01 09:05:38,978 [StateMachineUpdater-89770995-a80c-451d-a875-a0d384c2067f] INFO (ContainerStateMachine.java:573) - Gap in indexes at:0 detected, adding dummy entries
> 2019-03-01 09:05:38,979 [StateMachineUpdater-89770995-a80c-451d-a875-a0d384c2067f] INFO (ContainerStateMachine.java:573) - Gap in indexes at:1 detected, adding dummy entries
> 2019-03-01 09:05:38,980 [StateMachineUpdater-89770995-a80c-451d-a875-a0d384c2067f] INFO (ContainerStateMachine.java:573) - Gap in indexes at:2 detected, adding dummy entries
> 2019-03-01 09:05:38,981 [StateMachineUpdater-89770995-a80c-451d-a875-a0d384c2067f] INFO (ContainerUtils.java:149) - Operation: CreateContainer : Trace ID: 5f169cde0a4c8a4e:f340a80dafdf68eb:5f169cde0a4c8a4e:0 : Message: Container creation failed, due to disk out of space : Result: DISK_OUT_OF_SPACE
> 2019-03-01 09:05:38,981 [StateMachineUpdater-89770995-a80c-451d-a875-a0d384c2067f] INFO (ContainerUtils.java:149) - Operation: WriteChunk : Trace ID: 5f169cde0a4c8a4e:f340a80dafdf68eb:5f169cde0a4c8a4e:0 : Message: ContainerID 79 creation failed : Result: DISK_OUT_OF_SPACE
> 2019-03-01 09:05:39,357 [grpc-default-executor-18] INFO (ContainerUtils.java:149) - Operation: CreateContainer : Trace ID: 5f169cde0a4c8a4e:896673a239485fcd:5f169cde0a4c8a4e:0 : Message: Container creation failed, due to disk out of space : Result: DISK_OUT_OF_SPACE
> 2019-03-01 09:05:39,358 [grpc-default-executor-18] INFO (ContainerUtils.java:149) - Operation: WriteChunk : Trace ID: 5f169cde0a4c8a4e:896673a239485fcd:5f169cde0a4c8a4e:0 : Message: ContainerID 79 creation failed : Result: DISK_OUT_OF_SPACE
>  
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org