You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Mark Gui (Jira)" <ji...@apache.org> on 2021/07/29 13:12:00 UTC

[jira] [Assigned] (HDDS-5514) Skip check for UNHEALTHY containers for datanode finalize.

     [ https://issues.apache.org/jira/browse/HDDS-5514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Gui reassigned HDDS-5514:
------------------------------

    Assignee: Mark Gui

> Skip check for UNHEALTHY containers for datanode finalize.
> ----------------------------------------------------------
>
>                 Key: HDDS-5514
>                 URL: https://issues.apache.org/jira/browse/HDDS-5514
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Mark Gui
>            Assignee: Mark Gui
>            Priority: Major
>              Labels: pull-request-available
>
> Here is a log that we got from a non-rolling upgrade:
> local/master(0766d2cd23afb29f0eb42cf95b09d3d2984c14fa) -> upstream/master(57d42b12d3b6451e2ac8519780e82993ecce3611)
> {code:java}
> // code placeholder
> 2021-07-27 20:49:48,491 [Command processor thread] INFO org.apache.hadoop.ozone.upgrade.UpgradeFinalizer: Finalization started.2021-07-27 20:49:48,502 [Command processor thread] WARN org.apache.hadoop.ozone.upgrade.UpgradeFinalizer: FinalizeUpgrade : Waiting for container to close, current state is: UNHEALTHY2021-07-27 20:49:48,503 [Command processor thread] INFO org.apache.hadoop.ozone.upgrade.UpgradeFinalizer: Pre Finalization checks failed on the DataNode.
> 2021-07-27 20:49:48,503 [Command processor thread] WARN org.apache.hadoop.ozone.upgrade.DefaultUpgradeFinalizationExecutor: Upgrade Finalization failed with following Exception. 
> PREFINALIZE_VALIDATION_FAILED org.apache.hadoop.ozone.upgrade.UpgradeException: Pre Finalization checks failed on the DataNode.
>         at org.apache.hadoop.ozone.container.upgrade.DataNodeUpgradeFinalizer.preFinalizeUpgrade(DataNodeUpgradeFinalizer.java:55)
>         at org.apache.hadoop.ozone.container.upgrade.DataNodeUpgradeFinalizer.preFinalizeUpgrade(DataNodeUpgradeFinalizer.java:39)
>         at org.apache.hadoop.ozone.upgrade.DefaultUpgradeFinalizationExecutor.execute(DefaultUpgradeFinalizationExecutor.java:48)        at org.apache.hadoop.ozone.upgrade.BasicUpgradeFinalizer.finalize(BasicUpgradeFinalizer.java:75)
>         at org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.finalizeUpgrade(DatanodeStateMachine.java:622)
>         at org.apache.hadoop.ozone.container.common.statemachine.commandhandler.FinalizeNewLayoutVersionCommandHandler.handle(FinalizeNewLayoutVersionCommandHandler.java:78)
>         at org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CommandDispatcher.handle(CommandDispatcher.java:99)
>         at org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$initCommandHandlerThread$2(DatanodeStateMachine.java:551)
>         at java.lang.Thread.run(Thread.java:748)2021-07-27 20:49:48,503 [Command processor thread] INFO org.apache.hadoop.ozone.container.common.statemachine.commandhandler.FinalizeNewLayoutVersionCommandHandler: Processing FinalizeNewLayoutVersionCommandHandler command.
> 2021-07-27 20:49:48,503 [Command processor thread] INFO org.apache.hadoop.ozone.container.common.statemachine.commandhandler.FinalizeNewLayoutVersionCommandHandler: Finalize Upgrade called!
> {code}
> Finalize on datanode checks whether there are containers at non-closed states:
> {code:java}
> // DataNodeUpgradeFinalizer.java
> private boolean canFinalizeDataNode(DatanodeStateMachine dsm) {
>   // Lets be sure that we do not have any open container before we return
>   // from here. This function should be called in its own finalizer thread
>   // context.
>   Iterator<Container<?>> containerIt =
>       dsm.getContainer().getController().getContainers();
>   while (containerIt.hasNext()) {
>     Container ctr = containerIt.next();
>     ContainerProtos.ContainerDataProto.State state = ctr.getContainerState();
>     switch (state) {
>     case OPEN:
>     case CLOSING:
>     case UNHEALTHY:
>       LOG.warn("FinalizeUpgrade : Waiting for container to close, current "
>           + "state is: {}", state);
>       return false;
>     default:
>       continue;
>     }
>   }
>   return true;
> }
> {code}
> But actually there may be a good many containers in UNHEALTHY states, at least in our deployment with about 400000 containers.
>  
> Actually not all layout features require all containers to be non-UNHEALTHY states, such as SCM_HA and some potential features like Merging Rocksdb Instances for datanode, which don't touch container layout at all.
> And we may want to do non-rolling upgrade first and fix the UNHEALTHY containers later, maybe replication manager will handle them later but takes a plenty of time.
>  
> So I suggest to add a flag to make it possible to turn off the check for UNHEALTHY containers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org