You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Marton Elek (Jira)" <ji...@apache.org> on 2021/03/08 15:36:00 UTC

[jira] [Commented] (HDDS-4703) New OM couldn't be started due to NOT_FORMATTED Ratis dir

    [ https://issues.apache.org/jira/browse/HDDS-4703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17297462#comment-17297462 ] 

Marton Elek commented on HDDS-4703:
-----------------------------------

One possible root cause:

{code}
city: 10000, scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler, ipcBackoff: false.
ozone-om-0 om 2021-03-08 15:31:54 INFO  RaftServerConfigKeys:44 - raft.server.log.corruption.policy = EXCEPTION (default)
ozone-om-0 om 2021-03-08 15:31:54 INFO  RaftStorageDirectory:132 - The storage directory /data/metadata/ratis/bf265839-605b-3f16-9796-c5ba1605619e does not exist. Creating ...
ozone-om-0 om 2021-03-08 15:31:54 ERROR OzoneManagerStarter:69 - OM start failed with exception
ozone-om-0 om java.net.SocketException: Call From ozone-om-0.ozone-om to null:0 failed on socket exception: java.net.SocketException: Unresolved address; For more details see:  http://wiki.apache.org/hadoop/SocketException
ozone-om-0 om 	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
ozone-om-0 om 	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:64)
ozone-om-0 om 	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
ozone-om-0 om 	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500)
ozone-om-0 om 	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481)
ozone-om-0 om 	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:836)
ozone-om-0 om 	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:803)
ozone-om-0 om 	at org.apache.hadoop.ipc.Server.bind(Server.java:634)
ozone-om-0 om 	at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:1215)
ozone-om-0 om 	at org.apache.hadoop.ipc.Server.<init>(Server.java:3108)
ozone-om-0 om 	at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:1055)
ozone-om-0 om 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server.<init>(ProtobufRpcEngine.java:426)
ozone-om-0 om 	at org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:347)
ozone-om-0 om 	at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:853)
ozone-om-0 om 	at org.apache.hadoop.ozone.om.OzoneManager.startRpcServer(OzoneManager.java:912)
ozone-om-0 om 	at org.apache.hadoop.ozone.om.OzoneManager.getRpcServer(OzoneManager.java:1318)
ozone-om-0 om 	at org.apache.hadoop.ozone.om.OzoneManager.<init>(OzoneManager.java:493)
ozone-om-0 om 	at org.apache.hadoop.ozone.om.OzoneManager.createOm(OzoneManager.java:937)
ozone-om-0 om 	at org.apache.hadoop.ozone.om.OzoneManagerStarter$OMStarterHelper.start(OzoneManagerStarter.java:125)
ozone-om-0 om 	at org.apache.hadoop.ozone.om.OzoneManagerStarter.startOm(OzoneManagerStarter.java:79)
ozone-om-0 om 	at org.apache.hadoop.ozone.om.OzoneManagerStarter.call(OzoneManagerStarter.java:67)
ozone-om-0 om 	at org.apache.hadoop.ozone.om.OzoneManagerStarter.call(OzoneManagerStarter.java:38)
ozone-om-0 om 	at picocli.CommandLine.executeUserObject(CommandLine.java:1933)
ozone-om-0 om 	at picocli.CommandLine.access$1100(CommandLine.java:145)
ozone-om-0 om 	at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2332)
ozone-om-0 om 	at picocli.CommandLine$RunLast.handle(CommandLine.java:2326)
ozone-om-0 om 	at picocli.CommandLine$RunLast.handle(CommandLine.java:2291)
ozone-om-0 om 	at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2152)
ozone-om-0 om 	at picocli.CommandLine.parseWithHandlers(CommandLine.java:2530)
ozone-om-0 om 	at picocli.CommandLine.parseWithHandler(CommandLine.java:2465)
ozone-om-0 om 	at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:96)
ozone-om-0 om 	at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:87)
ozone-om-0 om 	at org.apache.hadoop.ozone.om.OzoneManagerStarter.main(OzoneManagerStarter.java:51)
ozone-om-0 om Caused by: java.net.SocketException: Unresolved address
ozone-om-0 om 	at java.base/sun.nio.ch.Net.translateToSocketException(Net.java:189)
ozone-om-0 om Call From ozone-om-0.ozone-om to null:0 failed on socket exception: java.net.SocketException: Unresolved address; For more details see:  http://wiki.apache.org/hadoop/SocketException
ozone-om-0 om 	at java.base/sun.nio.ch.Net.translateException(Net.java:217)
ozone-om-0 om 	at java.base/sun.nio.ch.Net.translateException(Net.java:223)
ozone-om-0 om 	at java.base/sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:90)
ozone-om-0 om 	at org.apache.hadoop.ipc.Server.bind(Server.java:617)
ozone-om-0 om 	... 25 more
ozone-om-0 om Caused by: java.nio.channels.UnresolvedAddressException
ozone-om-0 om 	at java.base/sun.nio.ch.Net.checkAddress(Net.java:149)
ozone-om-0 om 	at java.base/sun.nio.ch.Net.checkAddress(Net.java:157)
ozone-om-0 om 	at java.base/sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:243)
ozone-om-0 om 	at java.base/sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:88)
ozone-om-0 om 	... 26 more
ozone-om-0 om 2021-03-08 15:31:54 INFO  RaftStorageDirectory:215 - Lock on /data/metadata/ratis/bf265839-605b-3f16-9796-c5ba1605619e/in_use.lock acquired by nodename 66@ozone-om-0
ozone-om-0 om 2021-03-08 15:31:54 INFO  OzoneManagerStarter:124 - SHUTDOWN_MSG: 
ozone-om-0 om /************************************************************
ozone-om-0 om SHUTDOWN_MSG: Shutting down OzoneManager at ozone-om-0/10.42.3.81
ozone-om-0 om ************************************************************/
ozone-om-0 om Process exited with exit code 255
{code}

> New OM couldn't be started due to NOT_FORMATTED Ratis dir 
> ----------------------------------------------------------
>
>                 Key: HDDS-4703
>                 URL: https://issues.apache.org/jira/browse/HDDS-4703
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Marton Elek
>            Priority: Critical
>         Attachments: om.init.log, om.log
>
>
> Used the scripts as before, the OM couldn't be started anymore:
> Version: 
> {code}
> Source code repository git@github.com:apache/ozone.git -r 159b0c61c3264c9c3c3e1e6e94ef853e31138557
> {code} 
> The ozone init was successfull:
> {code}
> ************************************************************/
> 2021-01-14 16:01:41 INFO  OzoneManagerStarter:90 - registered UNIX signal handlers for [TERM, HUP, INT]
> 2021-01-14 16:01:41 INFO  OMHANodeDetails:104 - ozone.om.internal.service.id is not defined, falling back to ozone.om.service.ids to find serviceID for OzoneManager if it is HA enabled cluster
> 2021-01-14 16:01:41 INFO  OMHANodeDetails:210 - Configuration either no ozone.om.address set. Falling back to the default OM address ozone-om-0.ozone-om:9862
> 2021-01-14 16:01:41 INFO  OMHANodeDetails:238 - OM Service ID is not set. Setting it to the default ID: omServiceIdDefault
> 2021-01-14 16:01:41 WARN  ServerUtils:225 - ozone.om.db.dirs is not configured. We recommend adding this setting. Falling back to ozone.metadata.dirs instead.
> 2021-01-14 16:01:41 WARN  NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> OM initialization succeeded.Current cluster id for sd=/data/metadata/om;cid=CID-4b397366-5296-4cfd-addd-e7cf94ceb846;layoutVersion=0
> 2021-01-14 16:01:41 INFO  OzoneManagerStarter:124 - SHUTDOWN_MSG: 
> /************************************************************
> SHUTDOWN_MSG: Shutting down OzoneManager at ozone-om-0.ozone-om.default.svc.cluster.local/10.42.3.3
> ************************************************************/
> {code}
> But om failed to start:
> {code}
> 2021-01-14 16:11:26 ERROR OzoneManagerStarter:69 - OM start failed with exception
> java.io.IOException: Cannot load Storage Directory /data/metadata/ratis/bf265839-605b-3f16-9796-c5ba1605619e. Its state: NOT_FORMATTED
>         at org.apache.ratis.server.storage.RaftStorageImpl.<init>(RaftStorageImpl.java:68)
>         at org.apache.ratis.server.storage.RaftStorageImpl.<init>(RaftStorageImpl.java:51)
>         at org.apache.ratis.server.impl.ServerState.<init>(ServerState.java:108)
>         at org.apache.ratis.server.impl.RaftServerImpl.<init>(RaftServerImpl.java:192)
>         at org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$4(RaftServerProxy.java:266)
>         at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Cannot load Storage Directory /data/metadata/ratis/bf265839-605b-3f16-9796-c5ba1605619e. Its state: NOT_FORMATTED
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org