You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2020/02/03 16:59:06 UTC

[GitHub] [hadoop-ozone] sodonnel opened a new pull request #521: HDDS-2592 Add Datanode command to allow the datanode to persist its admin state

sodonnel opened a new pull request #521: HDDS-2592 Add Datanode command to allow the datanode to persist its admin state
URL: https://github.com/apache/hadoop-ozone/pull/521
 
 
   ## What changes were proposed in this pull request?
   
   When the datanode state is change in SCMNodeManager a Datanode Command is created to transmit the new datanode state to the datanode. The datanode will then persist this state in the "datanodeID.yaml" file.
   
   The current datanode.yaml file simply persists a DatanodeDetails object in YAML format. Therefore two new fields have been added to DatanodeDetails to support this:
   
   persistedOpState,
   persistedOpStateExpiryEpochSec
   
   The first is the operational state and the second is the seconds from the epoch the setting should expire (for maintenance mode).
   
   When a datanode starts up, it will read these values and supply them in the heartbeat to SCM.
   
   If SCM notices the DN was registered before, SCM is seen as the source of truth and will update the DN state to that set in SCM if they differ.
   
   If the DN was never registered with SCM (eg if SCM was restarted), then the DN state is seen as the source of truth and SCM will be updated accordingly.
   
   This change also moves the state expiry time into the NodeStatus object, so it now carries operationalState, healthState and State expiry.
   
   Note that the DN does not do anything with the persisted operational state. They are simply persisted and reported in the heartbeat. In this respect the DNs are being used as a form of distributed storage for the operational state so the decommission process can survive a restart.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-2592
   
   ## How was this patch tested?
   
   Tested manually using a docker build. This change still needs unit tests added if the current approach is agreed upon.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] anuengineer commented on a change in pull request #521: HDDS-2592 Add Datanode command to allow the datanode to persist its admin state

Posted by GitBox <gi...@apache.org>.
anuengineer commented on a change in pull request #521: HDDS-2592 Add Datanode command to allow the datanode to persist its admin state
URL: https://github.com/apache/hadoop-ozone/pull/521#discussion_r374850764
 
 

 ##########
 File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/helpers/DatanodeIdYaml.java
 ##########
 @@ -82,6 +83,12 @@ public static DatanodeDetails readDatanodeIdFile(File path)
           .setIpAddress(datanodeDetailsYaml.getIpAddress())
           .setHostName(datanodeDetailsYaml.getHostName())
           .setCertSerialId(datanodeDetailsYaml.getCertSerialId());
+      if (datanodeDetailsYaml.getPersistedOpState() != null) {
+        builder.setPersistedOpState(HddsProtos.NodeOperationalState.valueOf(
+            datanodeDetailsYaml.getPersistedOpState()));
+      }
+      builder.setPersistedOpStateExpiry(
 
 Review comment:
   Shouldn't this be also inside the if != null ? , I am going to presume that even if we have no data, datanodeDetailsYaml.getPersistedOpStateExpiryEpochSec is going to return zero, but since the expiry is a long, JVM would initialize it to zero by default, would it not?. So curious why this check is outside the if. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] sodonnel commented on issue #521: HDDS-2592 Add Datanode command to allow the datanode to persist its admin state

Posted by GitBox <gi...@apache.org>.
sodonnel commented on issue #521: HDDS-2592 Add Datanode command to allow the datanode to persist its admin state
URL: https://github.com/apache/hadoop-ozone/pull/521#issuecomment-606217079
 
 
   We have parked the decommission work at the moment. Anu approved this change, but I feel we need some unit tests around the changes.
   
   As this is only going onto a branch which we will need to review in more detail later, I have raised HDDS-3303 to log the fact we need some tests and then I propose we commit this to the branch as is to get it out of the PR queue.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek commented on issue #521: HDDS-2592 Add Datanode command to allow the datanode to persist its admin state

Posted by GitBox <gi...@apache.org>.
elek commented on issue #521: HDDS-2592 Add Datanode command to allow the datanode to persist its admin state
URL: https://github.com/apache/hadoop-ozone/pull/521#issuecomment-587378058
 
 
   /pending "I will get some tests added and check your comments in more detail."

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] anuengineer commented on a change in pull request #521: HDDS-2592 Add Datanode command to allow the datanode to persist its admin state

Posted by GitBox <gi...@apache.org>.
anuengineer commented on a change in pull request #521: HDDS-2592 Add Datanode command to allow the datanode to persist its admin state
URL: https://github.com/apache/hadoop-ozone/pull/521#discussion_r374856136
 
 

 ##########
 File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/helpers/DatanodeIdYaml.java
 ##########
 @@ -105,6 +112,8 @@ public static DatanodeDetails readDatanodeIdFile(File path)
     private String ipAddress;
     private String hostName;
     private String certSerialId;
+    private String persistedOpState;
+    private long persistedOpStateExpiryEpochSec = 0;
 
 Review comment:
   Since zero is the default value for the long, would it make sense to enforce time; that is for each set command we need an explicit time. In that case, we can define a constant called INFINITY and set it to something like -1; but in the source, it will be very explicit when someone reads code, and in the command handler we can enforce the requirement (or perhaps better on the SCM side). So commands like decommission will have an infinite value explicitly set by SCM and it would be more obvious and explicit.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] sodonnel commented on issue #521: HDDS-2592 Add Datanode command to allow the datanode to persist its admin state

Posted by GitBox <gi...@apache.org>.
sodonnel commented on issue #521: HDDS-2592 Add Datanode command to allow the datanode to persist its admin state
URL: https://github.com/apache/hadoop-ozone/pull/521#issuecomment-582139686
 
 
   @anuengineer Thanks for the review on this. I probably need to add some tests around these changes. I did not do that initially, as I wanted to be sure we were happy with the approach (storing the state in the datanode details, allowing it to be included in the heartbeat and the DN not making further use of the value). Now you have reviewed and are happy with this direction, I will get some tests added and check your comments in more detail.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] sodonnel commented on a change in pull request #521: HDDS-2592 Add Datanode command to allow the datanode to persist its admin state

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #521: HDDS-2592 Add Datanode command to allow the datanode to persist its admin state
URL: https://github.com/apache/hadoop-ozone/pull/521#discussion_r400464419
 
 

 ##########
 File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/helpers/DatanodeIdYaml.java
 ##########
 @@ -105,6 +112,8 @@ public static DatanodeDetails readDatanodeIdFile(File path)
     private String ipAddress;
     private String hostName;
     private String certSerialId;
+    private String persistedOpState;
+    private long persistedOpStateExpiryEpochSec = 0;
 
 Review comment:
   I think this is a good suggestion so I have logged it in HDDS-3304 and we can consider it when the decommission work is started up again.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] sodonnel merged pull request #521: HDDS-2592 Add Datanode command to allow the datanode to persist its admin state

Posted by GitBox <gi...@apache.org>.
sodonnel merged pull request #521: HDDS-2592 Add Datanode command to allow the datanode to persist its admin state
URL: https://github.com/apache/hadoop-ozone/pull/521
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] sodonnel commented on a change in pull request #521: HDDS-2592 Add Datanode command to allow the datanode to persist its admin state

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #521: HDDS-2592 Add Datanode command to allow the datanode to persist its admin state
URL: https://github.com/apache/hadoop-ozone/pull/521#discussion_r374945844
 
 

 ##########
 File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/helpers/DatanodeIdYaml.java
 ##########
 @@ -82,6 +83,12 @@ public static DatanodeDetails readDatanodeIdFile(File path)
           .setIpAddress(datanodeDetailsYaml.getIpAddress())
           .setHostName(datanodeDetailsYaml.getHostName())
           .setCertSerialId(datanodeDetailsYaml.getCertSerialId());
+      if (datanodeDetailsYaml.getPersistedOpState() != null) {
+        builder.setPersistedOpState(HddsProtos.NodeOperationalState.valueOf(
+            datanodeDetailsYaml.getPersistedOpState()));
+      }
+      builder.setPersistedOpStateExpiry(
 
 Review comment:
   If I recall (I wrote this code a month or so ago and forgot to create the PR) the reason the OpState is wrapped in a IF block is because I need take the string read from the YAML file and turn it into the enum. The enum didn't like me calling valueOf(null), so I only make that call if its non-null. All the other parameters in the builder are strings except for the two new ones (enum and long), so they never really had to handle the null case before.
   
   You are correct, in that if there is no data, the "getPersistedOpStateExpiryEpochSec" will end up as zero. That is OK, as SCM treats zero as "no expiry" right now.
   
   I've got similar logic at the end of that class too to handle the turning the enum into a string:
   
   ```
       String persistedOpString = null;
       if (datanodeDetails.getPersistedOpState() != null) {
         persistedOpString = datanodeDetails.getPersistedOpState().name();
       }
       return new DatanodeDetailsYaml(
           datanodeDetails.getUuid().toString(),
           datanodeDetails.getIpAddress(),
           datanodeDetails.getHostName(),
           datanodeDetails.getCertSerialId(),
           persistedOpString,
           datanodeDetails.getPersistedOpStateExpiryEpochSec(),
           portDetails);
     }
   ```
   
   I think the shorter summary is that the enums needs special handling around the nulls while the long and Strings behave OK without wrapping them in IF blocks.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] anuengineer commented on a change in pull request #521: HDDS-2592 Add Datanode command to allow the datanode to persist its admin state

Posted by GitBox <gi...@apache.org>.
anuengineer commented on a change in pull request #521: HDDS-2592 Add Datanode command to allow the datanode to persist its admin state
URL: https://github.com/apache/hadoop-ozone/pull/521#discussion_r374853569
 
 

 ##########
 File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/commandhandler/SetNodeOperationalStateCommandHandler.java
 ##########
 @@ -0,0 +1,157 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+package org.apache.hadoop.ozone.container.common.statemachine.commandhandler;
+
+
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.hdds.protocol.DatanodeDetails;
+import org.apache.hadoop.hdds.protocol.proto.StorageContainerDatanodeProtocolProtos;
+import org.apache.hadoop.hdds.scm.HddsServerUtil;
+import org.apache.hadoop.hdds.scm.ScmConfigKeys;
+import org.apache.hadoop.ozone.container.common.helpers.ContainerUtils;
+import org.apache.hadoop.ozone.container.common.statemachine.SCMConnectionManager;
+import org.apache.hadoop.ozone.container.common.statemachine.StateContext;
+import org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer;
+import org.apache.hadoop.ozone.protocol.commands.SCMCommand;
+import org.apache.hadoop.ozone.protocol.commands.SetNodeOperationalStateCommand;
+import org.apache.hadoop.util.Time;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.apache.hadoop.conf.Configuration;
+
+import org.apache.hadoop.hdds.protocol.proto.
+    StorageContainerDatanodeProtocolProtos.SCMCommandProto.Type;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.concurrent.atomic.AtomicLong;
+
+
+/**
+ * Handle the SetNodeOperationalStateCommand sent from SCM to the datanode
+ * to persist the current operational state.
+ */
+public class SetNodeOperationalStateCommandHandler implements CommandHandler {
+
+  private static final Logger LOG =
+      LoggerFactory.getLogger(SetNodeOperationalStateCommandHandler.class);
+  private final Configuration conf;
+  private final AtomicInteger invocationCount = new AtomicInteger(0);
+  private final AtomicLong totalTime = new AtomicLong(0);
+
+  /**
+   * Set Node State command handler.
+   *
+   * @param conf - Configuration for the datanode.
+   */
+  public SetNodeOperationalStateCommandHandler(Configuration conf) {
+    this.conf = conf;
+  }
+
+  /**
+   * Handles a given SCM command.
+   *
+   * @param command - SCM Command
+   * @param container - Ozone Container.
+   * @param context - Current Context.
+   * @param connectionManager - The SCMs that we are talking to.
+   */
+  @Override
+  public void handle(SCMCommand command, OzoneContainer container,
+      StateContext context, SCMConnectionManager connectionManager) {
+    long startTime = Time.monotonicNow();
+    invocationCount.incrementAndGet();
+    StorageContainerDatanodeProtocolProtos.SetNodeOperationalStateCommandProto
+        setNodeCmdProto = null;
+
+    if (command.getType() != Type.setNodeOperationalStateCommand) {
+      LOG.warn("Skipping handling command, expected command "
+              + "type {} but found {}",
+          Type.setNodeOperationalStateCommand, command.getType());
+      return;
+    }
+    SetNodeOperationalStateCommand setNodeCmd =
+        (SetNodeOperationalStateCommand) command;
+    setNodeCmdProto = setNodeCmd.getProto();
+    DatanodeDetails dni = context.getParent().getDatanodeDetails();
+    dni.setPersistedOpState(setNodeCmdProto.getNodeOperationalState());
+    dni.setPersistedOpStateExpiryEpochSec(
+        setNodeCmd.getStateExpiryEpochSeconds());
+    try {
+      persistDatanodeDetails(dni);
+    } catch (IOException ioe) {
+      LOG.error("Failed to persist the datanode state", ioe);
+      // TODO - this should probably be raised, but it will break the command
 
 Review comment:
   We can add this to the Dn reports if you want to propagate this back into SCM. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] sodonnel commented on a change in pull request #521: HDDS-2592 Add Datanode command to allow the datanode to persist its admin state

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #521: HDDS-2592 Add Datanode command to allow the datanode to persist its admin state
URL: https://github.com/apache/hadoop-ozone/pull/521#discussion_r374949300
 
 

 ##########
 File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/helpers/DatanodeIdYaml.java
 ##########
 @@ -105,6 +112,8 @@ public static DatanodeDetails readDatanodeIdFile(File path)
     private String ipAddress;
     private String hostName;
     private String certSerialId;
+    private String persistedOpState;
+    private long persistedOpStateExpiryEpochSec = 0;
 
 Review comment:
   This might be a good idea. The current "infinity" value in SCM is zero, as it checks for greater than zero expiry right now, but it could be changed to a constant easily enough.
   
   It would be good for the command to always get a time so its consistent. I need to go back and refresh my memory on the command etc here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] sodonnel edited a comment on issue #521: HDDS-2592 Add Datanode command to allow the datanode to persist its admin state

Posted by GitBox <gi...@apache.org>.
sodonnel edited a comment on issue #521: HDDS-2592 Add Datanode command to allow the datanode to persist its admin state
URL: https://github.com/apache/hadoop-ozone/pull/521#issuecomment-606217079
 
 
   We have parked the decommission work at the moment. Anu approved this change, but I feel we need some unit tests around the changes.
   
   I have raised HDDS-3303 to log the fact we need some tests. I also raised HDDS-3304 for the "use INFINITY as a constant" suggestion.
   
   As this is only going onto a branch which we will need to review in more detail later, I propose we commit this to the branch as is to get it out of the PR queue and leave the decommission work parked in a good state.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] sodonnel commented on issue #521: HDDS-2592 Add Datanode command to allow the datanode to persist its admin state

Posted by GitBox <gi...@apache.org>.
sodonnel commented on issue #521: HDDS-2592 Add Datanode command to allow the datanode to persist its admin state
URL: https://github.com/apache/hadoop-ozone/pull/521#issuecomment-606221528
 
 
   /ready

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org