You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Ethan Rose (Jira)" <ji...@apache.org> on 2021/10/20 20:35:10 UTC

[jira] [Updated] (HDDS-830) Datanode should not start XceiverServerRatis before getting version information from SCM

     [ https://issues.apache.org/jira/browse/HDDS-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ethan Rose updated HDDS-830:
----------------------------
    Target Version/s: 1.3.0  (was: 1.2.0)

I am managing the 1.2.0 release and we currently have more than 600 issues targeted for 1.2.0. I am moving the target field to 1.3.0.

If you are actively working on this jira and believe this should be targeted for the 1.2.0 release, Please reach out to me via Apache email or Slack.

> Datanode should not start XceiverServerRatis before getting version information from SCM
> ----------------------------------------------------------------------------------------
>
>                 Key: HDDS-830
>                 URL: https://issues.apache.org/jira/browse/HDDS-830
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Datanode
>    Affects Versions: 0.3.0
>            Reporter: Nandakumar
>            Priority: Major
>              Labels: TriagePending
>
> If a datanode restarts quickly before SCM detects, it will rejoin the ratis ring (existing pipeline). Since SCM didn't detect this restart, the pipeline is not closed. Now there is a time gap after the datanode is started and it got the version information from SCM. During this time, the SCM ID in datanode is not set(null). If a client tries to use this pipeline during that time, the container state machine will throw {{java.lang.NullPointerException: scmId cannot be nul}}. This will cause {{RaftLogWorker}} to terminate resulting in datanode crash.
> {code}
> 2018-11-12 19:45:31,811 ERROR storage.RaftLogWorker (ExitUtils.java:terminate(86)) - Terminating with exit status 1: 407fd181-2ff7-4651-9a47-a0927ede4c51-RaftLogWorker failed.
> java.io.IOException: java.lang.NullPointerException: scmId cannot be null
>   at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54)
>   at org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61)
>   at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:83)
>   at org.apache.ratis.server.storage.RaftLogWorker$StateMachineDataPolicy.getFromFuture(RaftLogWorker.java:76)
>   at org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:344)
>   at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:216)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException: scmId cannot be null
>   at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
>   at org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.create(KeyValueContainer.java:106)
>   at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleCreateContainer(KeyValueHandler.java:242)
>   at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:165)
>   at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.createContainer(HddsDispatcher.java:206)
>   at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:124)
>   at org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:274)
>   at org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:280)
>   at org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$handleWriteChunk$1(ContainerStateMachine.java:301)
>   at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   ... 1 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org