You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ratis.apache.org by "Tsz-wo Sze (Jira)" <ji...@apache.org> on 2021/05/10 08:09:00 UTC
[jira] [Assigned] (RATIS-1375) Handle bad storage dir due to disk
failures
[ https://issues.apache.org/jira/browse/RATIS-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tsz-wo Sze reassigned RATIS-1375:
---------------------------------
Assignee: Mark Gui
> Handle bad storage dir due to disk failures
> -------------------------------------------
>
> Key: RATIS-1375
> URL: https://issues.apache.org/jira/browse/RATIS-1375
> Project: Ratis
> Issue Type: Bug
> Components: server
> Reporter: Mark Gui
> Assignee: Mark Gui
> Priority: Major
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> When testing ozone with bad ratis volume, we hit the following log:
> ```
> {{2021-05-06 18:19:48,166 [Command processor thread] ERROR org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CreatePipelineCommandHandler: Can't create pipeline RATIS THREE PipelineID=08de41a6-5c9e-48d4-9789-4c09798ecffd
> java.io.IOException: Input/output error
> at org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.addGroup(XceiverServerRatis.java:805)
> at org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CreatePipelineCommandHandler.handle(CreatePipelineCommandHandler.java:92)
> at org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CommandDispatcher.handle(CommandDispatcher.java:99)
> at org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$initCommandHandlerThread$2(DatanodeStateMachine.java:506)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Input/output error
> at java.io.UnixFileSystem.canonicalize0(Native Method)
> at java.io.UnixFileSystem.canonicalize(UnixFileSystem.java:172)
> at java.io.File.getCanonicalPath(File.java:620)
> at org.apache.ratis.server.storage.RaftStorageDirectoryImpl.analyzeStorage(RaftStorageDirectoryImpl.java:129)
> at org.apache.ratis.server.storage.RaftStorageImpl.analyzeAndRecoverStorage(RaftStorageImpl.java:95)
> at org.apache.ratis.server.storage.RaftStorageImpl.<init>(RaftStorageImpl.java:65)
> at org.apache.ratis.server.storage.RaftStorageImpl.<init>(RaftStorageImpl.java:51)
> at org.apache.ratis.server.impl.ServerState.<init>(ServerState.java:112)
> at org.apache.ratis.server.impl.RaftServerImpl.<init>(RaftServerImpl.java:193)
> at org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$4(RaftServerProxy.java:266)
> at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> ... 1 more}}
> ```
> RaftServer does not catch the IOException and just throw it.
> Actually when we have multiple storageDirs, we could try other dirs.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)