You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Tejaskriya (via GitHub)" <gi...@apache.org> on 2024/01/24 08:33:49 UTC

[PR] HDDS-9738. Display startTime, pipeline and container counts for decommissioning datanode [ozone]

Tejaskriya opened a new pull request, #6083:
URL: https://github.com/apache/ozone/pull/6083

   ## What changes were proposed in this pull request?
   In order to track the progress of the decommissioning of a datanode, the number of pipelines associated to the datanode and the number of containers on the datanode blocking the decommissioning (i.e., unhealthy and under-replicated containers) is necessary to be shown as a part of the decommission status command.
   These counts, along with the time at which decommission was started for the datanode are stored as a part of metrics in NodeDecommissionMetrics. In this PR, the JMX endpoint for SCM is queried for the NodeDecommissionMetrics class and the response is parsed to display the counts and start-time for each node currently in DECOMMISSIONING.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-9738
   
   ## How was this patch tested?
   
   Tested locally in docker set-up:
   ```
   $ ozone admin datanode status decommission
   
   Decommission Status: DECOMMISSIONING - 1 node(s)
   
   Datanode: e56afcce-f5b5-4980-8b1b-d55a5714ad3c (/default-rack/172.21.0.9/ozone-datanode-4.ozone_default)
   Decommission started at : 170558246119118/01/2024 12:54:21 UTC
   No. of Pipelines: 1
   No. of UnderReplicated containers: 2
   No. of Unclosed Containers: 1
   {UnderReplicated=[#5,#6], UnClosed=[#10]}
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-9738. Display startTime, pipeline and container counts for decommissioning datanode [ozone]

Posted by "Tejaskriya (via GitHub)" <gi...@apache.org>.
Tejaskriya closed pull request #6083: HDDS-9738. Display startTime, pipeline and container counts for decommissioning datanode
URL: https://github.com/apache/ozone/pull/6083


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-9738. Display startTime, pipeline and container counts for decommissioning datanode [ozone]

Posted by "Tejaskriya (via GitHub)" <gi...@apache.org>.
Tejaskriya commented on code in PR #6083:
URL: https://github.com/apache/ozone/pull/6083#discussion_r1470631438


##########
hadoop-hdds/tools/src/main/java/org/apache/hadoop/hdds/scm/cli/datanode/DecommissionStatusSubCommand.java:
##########
@@ -94,4 +133,95 @@ private void printDetails(DatanodeDetails datanode) {
         " (" + datanode.getNetworkLocation() + "/" + datanode.getIpAddress()
         + "/" + datanode.getHostName() + ")");
   }
+  private void printCounts(DatanodeDetails datanode, Map<String, Object> counts, int numDecomNodes) {
+    try {
+      for (int i = 1; i <= numDecomNodes; i++) {
+        if (datanode.getHostName().equals(counts.get("tag.datanode." + i))) {
+          int pipelines = ((Double)counts.get("PipelinesWaitingToCloseDN." + i)).intValue();
+          int underReplicated = ((Double)counts.get("UnderReplicatedDN." + i)).intValue();
+          int unclosed = ((Double)counts.get("UnclosedContainersDN." + i)).intValue();
+          long startTime = ((Double)counts.get("StartTimeDN." + i)).longValue();
+          System.out.print("Decommission started at : ");
+          Date date = new Date(startTime);
+          DateFormat formatter = new SimpleDateFormat("dd/MM/yyyy hh:mm:ss z");
+          System.out.println(formatter.format(date));
+          System.out.println("No. of Pipelines: " + pipelines);
+          System.out.println("No. of UnderReplicated containers: " + underReplicated);
+          System.out.println("No. of Unclosed Containers: " + unclosed);
+          return;
+        }
+      }
+      System.err.println("Error getting pipeline and container counts for " + datanode.getHostName());
+    } catch (NullPointerException ex) {
+      System.err.println("Error getting pipeline and container counts for " + datanode.getHostName());
+    }
+  }
+
+  private Map<String, Object> getCounts() {
+    Map<String, Object> finalResult = new HashMap<>();
+    try {
+      StringBuffer url = new StringBuffer();
+      final OzoneConfiguration ozoneConf = parent
+          .getParent()
+          .getParent()
+          .getOzoneConf();
+      final String protocol;
+      final URLConnectionFactory connectionFactory = URLConnectionFactory.newDefaultURLConnectionFactory(ozoneConf);
+      final HttpConfig.Policy webPolicy = getHttpPolicy(ozoneConf);
+      String host;
+      InputStream inputStream;
+      int errorCode;
+
+      if (webPolicy.isHttpsEnabled()) {
+        protocol = HTTPS_SCHEME;
+        host = ozoneConf.get(OZONE_SCM_HTTPS_ADDRESS_KEY,
+            OZONE_SCM_HTTP_BIND_HOST_DEFAULT + OZONE_SCM_HTTPS_BIND_PORT_DEFAULT);
+        url.append(protocol).append("://").append(host).append("/jmx")
+            .append("?qry=Hadoop:service=StorageContainerManager,name=NodeDecommissionMetrics");
+
+        HttpsURLConnection httpsURLConnection = (HttpsURLConnection) connectionFactory
+            .openConnection(new URL(url.toString()));
+        httpsURLConnection.connect();
+        errorCode = httpsURLConnection.getResponseCode();
+        inputStream = httpsURLConnection.getInputStream();
+      } else {
+        protocol = HTTP_SCHEME;
+        host = ozoneConf.get(OZONE_SCM_HTTP_ADDRESS_KEY,
+            OZONE_SCM_HTTP_BIND_HOST_DEFAULT + OZONE_SCM_HTTP_BIND_PORT_DEFAULT);

Review Comment:
   Oh that's right, Thanks for catching that! I have fixed it now



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-9738. Display startTime, pipeline and container counts for decommissioning datanode [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on code in PR #6083:
URL: https://github.com/apache/ozone/pull/6083#discussion_r1469738359


##########
hadoop-hdds/tools/src/main/java/org/apache/hadoop/hdds/scm/cli/datanode/DecommissionStatusSubCommand.java:
##########
@@ -94,4 +133,95 @@ private void printDetails(DatanodeDetails datanode) {
         " (" + datanode.getNetworkLocation() + "/" + datanode.getIpAddress()
         + "/" + datanode.getHostName() + ")");
   }
+  private void printCounts(DatanodeDetails datanode, Map<String, Object> counts, int numDecomNodes) {
+    try {
+      for (int i = 1; i <= numDecomNodes; i++) {
+        if (datanode.getHostName().equals(counts.get("tag.datanode." + i))) {
+          int pipelines = ((Double)counts.get("PipelinesWaitingToCloseDN." + i)).intValue();
+          int underReplicated = ((Double)counts.get("UnderReplicatedDN." + i)).intValue();
+          int unclosed = ((Double)counts.get("UnclosedContainersDN." + i)).intValue();
+          long startTime = ((Double)counts.get("StartTimeDN." + i)).longValue();
+          System.out.print("Decommission started at : ");
+          Date date = new Date(startTime);
+          DateFormat formatter = new SimpleDateFormat("dd/MM/yyyy hh:mm:ss z");
+          System.out.println(formatter.format(date));
+          System.out.println("No. of Pipelines: " + pipelines);
+          System.out.println("No. of UnderReplicated containers: " + underReplicated);
+          System.out.println("No. of Unclosed Containers: " + unclosed);
+          return;
+        }
+      }
+      System.err.println("Error getting pipeline and container counts for " + datanode.getHostName());
+    } catch (NullPointerException ex) {
+      System.err.println("Error getting pipeline and container counts for " + datanode.getHostName());
+    }
+  }
+
+  private Map<String, Object> getCounts() {
+    Map<String, Object> finalResult = new HashMap<>();
+    try {
+      StringBuffer url = new StringBuffer();
+      final OzoneConfiguration ozoneConf = parent
+          .getParent()
+          .getParent()
+          .getOzoneConf();
+      final String protocol;
+      final URLConnectionFactory connectionFactory = URLConnectionFactory.newDefaultURLConnectionFactory(ozoneConf);
+      final HttpConfig.Policy webPolicy = getHttpPolicy(ozoneConf);
+      String host;
+      InputStream inputStream;
+      int errorCode;
+
+      if (webPolicy.isHttpsEnabled()) {

Review Comment:
   This is a good idea, to pull the metrics rather than having a special command for getting these details.
   
   Does this work if Kerberos is enabled and the SCM webUI has kerberos authentication enabled too?
   
   Also, what about HA SCM? We need to get the metrics from the active SCM, not the standbys as they will not have the correct metrics. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-9738. Display startTime, pipeline and container counts for decommissioning datanode [ozone]

Posted by "Tejaskriya (via GitHub)" <gi...@apache.org>.
Tejaskriya commented on code in PR #6083:
URL: https://github.com/apache/ozone/pull/6083#discussion_r1483915856


##########
hadoop-hdds/tools/src/main/java/org/apache/hadoop/hdds/scm/cli/datanode/DecommissionStatusSubCommand.java:
##########
@@ -94,4 +133,95 @@ private void printDetails(DatanodeDetails datanode) {
         " (" + datanode.getNetworkLocation() + "/" + datanode.getIpAddress()
         + "/" + datanode.getHostName() + ")");
   }
+  private void printCounts(DatanodeDetails datanode, Map<String, Object> counts, int numDecomNodes) {
+    try {
+      for (int i = 1; i <= numDecomNodes; i++) {
+        if (datanode.getHostName().equals(counts.get("tag.datanode." + i))) {
+          int pipelines = ((Double)counts.get("PipelinesWaitingToCloseDN." + i)).intValue();
+          int underReplicated = ((Double)counts.get("UnderReplicatedDN." + i)).intValue();
+          int unclosed = ((Double)counts.get("UnclosedContainersDN." + i)).intValue();
+          long startTime = ((Double)counts.get("StartTimeDN." + i)).longValue();
+          System.out.print("Decommission started at : ");
+          Date date = new Date(startTime);
+          DateFormat formatter = new SimpleDateFormat("dd/MM/yyyy hh:mm:ss z");
+          System.out.println(formatter.format(date));
+          System.out.println("No. of Pipelines: " + pipelines);
+          System.out.println("No. of UnderReplicated containers: " + underReplicated);
+          System.out.println("No. of Unclosed Containers: " + unclosed);
+          return;
+        }
+      }
+      System.err.println("Error getting pipeline and container counts for " + datanode.getHostName());
+    } catch (NullPointerException ex) {
+      System.err.println("Error getting pipeline and container counts for " + datanode.getHostName());
+    }
+  }
+
+  private Map<String, Object> getCounts() {
+    Map<String, Object> finalResult = new HashMap<>();
+    try {
+      StringBuffer url = new StringBuffer();
+      final OzoneConfiguration ozoneConf = parent
+          .getParent()
+          .getParent()
+          .getOzoneConf();
+      final String protocol;
+      final URLConnectionFactory connectionFactory = URLConnectionFactory.newDefaultURLConnectionFactory(ozoneConf);
+      final HttpConfig.Policy webPolicy = getHttpPolicy(ozoneConf);
+      String host;
+      InputStream inputStream;
+      int errorCode;
+
+      if (webPolicy.isHttpsEnabled()) {

Review Comment:
   To solve these issues, as you had suggested during our discussions, a better way would be to have something similar to JMXJsonServerlet from hadoop-common library in ozone which can return any filtered metrics through grpc calls. This way we avoid dealing with http calls issues like handling security and finding the scm leader to get the right metrics. We would still getting the metrics from MBeansServer and not adding any significant overhead in scm. 
   I have raised this PR with this new approach: [#6185](https://github.com/apache/ozone/pull/6185)
   Please do review it. Thank you!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-9738. Display startTime, pipeline and container counts for decommissioning datanode [ozone]

Posted by "adoroszlai (via GitHub)" <gi...@apache.org>.
adoroszlai commented on PR #6083:
URL: https://github.com/apache/ozone/pull/6083#issuecomment-1913109092

   @Tejaskriya there is one more findbugs error:
   
   ```
   H I Dm: Found reliance on default encoding in org.apache.hadoop.hdds.scm.cli.datanode.TestDecommissionStatusSubCommand$1.handle(HttpExchange): String.getBytes()  At TestDecommissionStatusSubCommand.java:[line 87]
   ```
   
   https://github.com/Tejaskriya/ozone/actions/runs/7650664460/job/20847133955


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-9738. Display startTime, pipeline and container counts for decommissioning datanode [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on PR #6083:
URL: https://github.com/apache/ozone/pull/6083#issuecomment-1910589016

   This is still marked draft - is it ready for review now?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-9738. Display startTime, pipeline and container counts for decommissioning datanode [ozone]

Posted by "Tejaskriya (via GitHub)" <gi...@apache.org>.
Tejaskriya commented on PR #6083:
URL: https://github.com/apache/ozone/pull/6083#issuecomment-1913097148

   @sodonnel yes, it is ready for review now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-9738. Display startTime, pipeline and container counts for decommissioning datanode [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on code in PR #6083:
URL: https://github.com/apache/ozone/pull/6083#discussion_r1469733534


##########
hadoop-hdds/tools/src/main/java/org/apache/hadoop/hdds/scm/cli/datanode/DecommissionStatusSubCommand.java:
##########
@@ -94,4 +133,95 @@ private void printDetails(DatanodeDetails datanode) {
         " (" + datanode.getNetworkLocation() + "/" + datanode.getIpAddress()
         + "/" + datanode.getHostName() + ")");
   }
+  private void printCounts(DatanodeDetails datanode, Map<String, Object> counts, int numDecomNodes) {
+    try {
+      for (int i = 1; i <= numDecomNodes; i++) {
+        if (datanode.getHostName().equals(counts.get("tag.datanode." + i))) {
+          int pipelines = ((Double)counts.get("PipelinesWaitingToCloseDN." + i)).intValue();
+          int underReplicated = ((Double)counts.get("UnderReplicatedDN." + i)).intValue();
+          int unclosed = ((Double)counts.get("UnclosedContainersDN." + i)).intValue();
+          long startTime = ((Double)counts.get("StartTimeDN." + i)).longValue();
+          System.out.print("Decommission started at : ");
+          Date date = new Date(startTime);
+          DateFormat formatter = new SimpleDateFormat("dd/MM/yyyy hh:mm:ss z");
+          System.out.println(formatter.format(date));
+          System.out.println("No. of Pipelines: " + pipelines);
+          System.out.println("No. of UnderReplicated containers: " + underReplicated);
+          System.out.println("No. of Unclosed Containers: " + unclosed);
+          return;
+        }
+      }
+      System.err.println("Error getting pipeline and container counts for " + datanode.getHostName());
+    } catch (NullPointerException ex) {
+      System.err.println("Error getting pipeline and container counts for " + datanode.getHostName());
+    }
+  }
+
+  private Map<String, Object> getCounts() {
+    Map<String, Object> finalResult = new HashMap<>();
+    try {
+      StringBuffer url = new StringBuffer();
+      final OzoneConfiguration ozoneConf = parent
+          .getParent()
+          .getParent()
+          .getOzoneConf();
+      final String protocol;
+      final URLConnectionFactory connectionFactory = URLConnectionFactory.newDefaultURLConnectionFactory(ozoneConf);
+      final HttpConfig.Policy webPolicy = getHttpPolicy(ozoneConf);
+      String host;
+      InputStream inputStream;
+      int errorCode;
+
+      if (webPolicy.isHttpsEnabled()) {
+        protocol = HTTPS_SCHEME;
+        host = ozoneConf.get(OZONE_SCM_HTTPS_ADDRESS_KEY,
+            OZONE_SCM_HTTP_BIND_HOST_DEFAULT + OZONE_SCM_HTTPS_BIND_PORT_DEFAULT);
+        url.append(protocol).append("://").append(host).append("/jmx")
+            .append("?qry=Hadoop:service=StorageContainerManager,name=NodeDecommissionMetrics");
+
+        HttpsURLConnection httpsURLConnection = (HttpsURLConnection) connectionFactory
+            .openConnection(new URL(url.toString()));
+        httpsURLConnection.connect();
+        errorCode = httpsURLConnection.getResponseCode();
+        inputStream = httpsURLConnection.getInputStream();
+      } else {
+        protocol = HTTP_SCHEME;
+        host = ozoneConf.get(OZONE_SCM_HTTP_ADDRESS_KEY,
+            OZONE_SCM_HTTP_BIND_HOST_DEFAULT + OZONE_SCM_HTTP_BIND_PORT_DEFAULT);

Review Comment:
   Does there need to be a ":" to combine the default_host:port ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org