You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/03/16 10:36:37 UTC

[GitHub] [ozone] elek opened a new pull request #2050: HDDS-4948. SCM-HA documentation

elek opened a new pull request #2050:
URL: https://github.com/apache/ozone/pull/2050


   ## What changes were proposed in this pull request?
   
   I created the first version of SCM-HA doc. As I didn't participate in the development I can be wrong. Please check it and feel free to extend it (branch is r/w).
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-4948
   
   ## How was this patch tested?
   
   ```
   cd hadoop-hdds/docs
   hugo serve
   firefox localhost:1313
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on a change in pull request #2050: HDDS-4948. SCM-HA documentation

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on a change in pull request #2050:
URL: https://github.com/apache/ozone/pull/2050#discussion_r595769572



##########
File path: hadoop-hdds/docs/content/feature/OM-HA.md
##########
@@ -27,15 +27,15 @@ Ozone has two leader nodes (*Ozone Manager* for key space management and *Storag
 
 To avoid any single point of failure the leader nodes also should have a HA setup.
 
- 1. HA of Ozone Manager is implemented with the help of RAFT (Apache Ratis)
- 2. HA of Storage Container Manager is [under implementation]({{< ref "scmha.md">}})
+Both Ozone Manager and Storage Container Manager supports HA. In this mode the internal state is replicated via RAFT (with Apache Ratis) 
+
+This document explain the HA setup of Ozone Manager (OM) HA, please check [this page[({{< ref "SCM-HA" >}})].  While they can be setup for HA independently, a reliable, full HA setup requires enabling HA for both services.

Review comment:
       No, it's not valid.  It renders as:
   
   ```
   please check [this page[(/feature/scm-ha.html)].
   ```
   
   ```suggestion
   This document explain the HA setup of Ozone Manager (OM) HA, please check [this page]({{< ref "SCM-HA" >}}) for SCM HA.  While they can be setup for HA independently, a reliable, full HA setup requires enabling HA for both services.
   ```
   
   Steps to verify:
   
   ```
   mvn -pl :hadoop-hdds-docs clean package
   open hadoop-hdds/docs/target/classes/docs/feature/om-ha.html
   ```
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] GlenGeng commented on pull request #2050: HDDS-4948. SCM-HA documentation

Posted by GitBox <gi...@apache.org>.
GlenGeng commented on pull request #2050:
URL: https://github.com/apache/ozone/pull/2050#issuecomment-800753771


   Hey Elek, Thanks for the doc! Could you rebase to the latest 2823, which have solved several flaky test cases.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] amaliujia commented on pull request #2050: HDDS-4948. SCM-HA documentation

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #2050:
URL: https://github.com/apache/ozone/pull/2050#issuecomment-800457651


   I will also take a look on this PR today.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] GlenGeng commented on a change in pull request #2050: HDDS-4948. SCM-HA documentation

Posted by GitBox <gi...@apache.org>.
GlenGeng commented on a change in pull request #2050:
URL: https://github.com/apache/ozone/pull/2050#discussion_r595698722



##########
File path: hadoop-hdds/docs/content/feature/SCM-HA.md
##########
@@ -0,0 +1,162 @@
+---
+title: "SCM High Availability"
+weight: 1
+menu:
+   main:
+      parent: Features
+summary: HA setup for Storage Container Manager to avoid any single point of failure.
+---
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+Ozone has two leader nodes (*Ozone Manager* for key space management and *Storage Container Management* for block space management) and storage nodes (Datanode). Data is replicated between Datanodes with the help of RAFT consensus algorithm.
+
+<div class="alert alert-warning" role="alert">
+Please note that SCM-HA is not ready for production in secure environments. Security work is in progress and will be finished soon.
+</div>
+
+To avoid any single point of failure the leader nodes also should have a HA setup.
+
+Both Ozone Manager and Storage Container Manager supports HA. In this mode the internal state is replicated via RAFT (with Apache Ratis) 
+
+This document explains the HA setup of Storage Container Manager (SCM), please check [this page]({{< ref "OM-HA" >}}) for HA setup of Ozone Manager (OM). While they can be setup for HA independently, a reliable, full HA setup requires enabling HA for both services. 
+
+## Configuration
+
+HA mode of Storage Container Manager can be enabled with the following settings in `ozone-site.xml`:
+
+```XML
+<property>
+   <name>ozone.scm.ratis.enable</name>
+   <value>true</value>
+</property>
+```
+One Ozone configuration (`ozone-site.xml`) can support multiple SCM HA node set, multiple Ozone clusters. To select between the available SCM nodes a logical name is required for each of the clusters which can be resolved to the IP addresses (and domain names) of the Storage Container Managers.
+
+This logical name is called `serviceId` and can be configured in the `ozone-site.xml`
+
+Most of the time you need to set only the values of your current cluster:
+
+ ```XML
+<property>
+   <name>ozone.scm.service.ids</name>
+   <value>cluster1</value>
+</property>
+```
+
+For each of the defined `serviceId` a logical configuration name should be defined for each of the servers
+
+```XML
+<property>
+   <name>ozone.scm.nodes.cluster1</name>
+   <value>scm1,scm2,scm3</value>
+</property>
+```
+
+The defined prefixes can be used to define the address of each of the SCM services:
+
+```XML
+<property>
+   <name>ozone.scm.address.cluster1.scm1</name>
+   <value>host1</value>
+</property>
+<property>
+   <name>ozone.scm.address.cluster1.scm1</name>
+   <value>host2</value>
+</property>
+<property>
+   <name>ozone.scm.address.cluster1.scm1</name>
+   <value>host3</value>
+</property>
+```

Review comment:
       the primordial node is an optional configuration, which is used in k8s.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] amaliujia commented on a change in pull request #2050: HDDS-4948. SCM-HA documentation

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2050:
URL: https://github.com/apache/ozone/pull/2050#discussion_r595702664



##########
File path: hadoop-hdds/docs/content/feature/SCM-HA.md
##########
@@ -0,0 +1,162 @@
+---
+title: "SCM High Availability"
+weight: 1
+menu:
+   main:
+      parent: Features
+summary: HA setup for Storage Container Manager to avoid any single point of failure.
+---
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+Ozone has two leader nodes (*Ozone Manager* for key space management and *Storage Container Management* for block space management) and storage nodes (Datanode). Data is replicated between Datanodes with the help of RAFT consensus algorithm.
+
+<div class="alert alert-warning" role="alert">
+Please note that SCM-HA is not ready for production in secure environments. Security work is in progress and will be finished soon.
+</div>
+
+To avoid any single point of failure the leader nodes also should have a HA setup.
+
+Both Ozone Manager and Storage Container Manager supports HA. In this mode the internal state is replicated via RAFT (with Apache Ratis) 
+
+This document explains the HA setup of Storage Container Manager (SCM), please check [this page]({{< ref "OM-HA" >}}) for HA setup of Ozone Manager (OM). While they can be setup for HA independently, a reliable, full HA setup requires enabling HA for both services. 
+
+## Configuration
+
+HA mode of Storage Container Manager can be enabled with the following settings in `ozone-site.xml`:
+
+```XML
+<property>
+   <name>ozone.scm.ratis.enable</name>
+   <value>true</value>
+</property>
+```
+One Ozone configuration (`ozone-site.xml`) can support multiple SCM HA node set, multiple Ozone clusters. To select between the available SCM nodes a logical name is required for each of the clusters which can be resolved to the IP addresses (and domain names) of the Storage Container Managers.
+
+This logical name is called `serviceId` and can be configured in the `ozone-site.xml`
+
+Most of the time you need to set only the values of your current cluster:
+
+ ```XML
+<property>
+   <name>ozone.scm.service.ids</name>
+   <value>cluster1</value>
+</property>
+```
+
+For each of the defined `serviceId` a logical configuration name should be defined for each of the servers
+
+```XML
+<property>
+   <name>ozone.scm.nodes.cluster1</name>
+   <value>scm1,scm2,scm3</value>
+</property>
+```
+
+The defined prefixes can be used to define the address of each of the SCM services:
+
+```XML
+<property>
+   <name>ozone.scm.address.cluster1.scm1</name>
+   <value>host1</value>
+</property>
+<property>
+   <name>ozone.scm.address.cluster1.scm1</name>
+   <value>host2</value>
+</property>
+<property>
+   <name>ozone.scm.address.cluster1.scm1</name>
+   <value>host3</value>
+</property>
+```

Review comment:
       thanks




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] amaliujia commented on a change in pull request #2050: HDDS-4948. SCM-HA documentation

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2050:
URL: https://github.com/apache/ozone/pull/2050#discussion_r595696488



##########
File path: hadoop-hdds/docs/content/feature/OM-HA.md
##########
@@ -27,15 +27,15 @@ Ozone has two leader nodes (*Ozone Manager* for key space management and *Storag
 
 To avoid any single point of failure the leader nodes also should have a HA setup.
 
- 1. HA of Ozone Manager is implemented with the help of RAFT (Apache Ratis)
- 2. HA of Storage Container Manager is [under implementation]({{< ref "scmha.md">}})
+Both Ozone Manager and Storage Container Manager supports HA. In this mode the internal state is replicated via RAFT (with Apache Ratis) 
+
+This document explain the HA setup of Ozone Manager (OM) HA, please check [this page[({{< ref "SCM-HA" >}})].  While they can be setup for HA independently, a reliable, full HA setup requires enabling HA for both services.

Review comment:
       Out of curiosity: 
   
   Is `[this page[({{< ref "SCM-HA" >}})]` markdown style? Is there a way to verify this is a valid link to SCM-HA doc?

##########
File path: hadoop-hdds/docs/content/feature/OM-HA.md
##########
@@ -27,15 +27,15 @@ Ozone has two leader nodes (*Ozone Manager* for key space management and *Storag
 
 To avoid any single point of failure the leader nodes also should have a HA setup.
 
- 1. HA of Ozone Manager is implemented with the help of RAFT (Apache Ratis)
- 2. HA of Storage Container Manager is [under implementation]({{< ref "scmha.md">}})
+Both Ozone Manager and Storage Container Manager supports HA. In this mode the internal state is replicated via RAFT (with Apache Ratis) 
+
+This document explain the HA setup of Ozone Manager (OM) HA, please check [this page[({{< ref "SCM-HA" >}})].  While they can be setup for HA independently, a reliable, full HA setup requires enabling HA for both services.
 
 ## Ozone Manager HA
 
-A single Ozone Manager uses [RocksDB](https://github.com/facebook/rocksdb/) to persiste metadata (volumes, buckets, keys) locally. HA version of Ozone Manager does exactly the same but all the data is replicated with the help of the RAFT consensus algorithm to follower Ozone Manager instances.
+A single Ozone Manager uses [RocksDB](https://github.com/facebook/rocksdb/) to persist metadata (volumes, buckets, keys) locally. HA version of Ozone Manager does exactly the same but all the data is replicated with the help of the RAFT consensus algorithm to follower Ozone Manager instances.

Review comment:
       I can make a change on Chinese version of this doc after this PR is merged.

##########
File path: hadoop-hdds/docs/content/feature/SCM-HA.md
##########
@@ -0,0 +1,162 @@
+---
+title: "SCM High Availability"
+weight: 1
+menu:
+   main:
+      parent: Features
+summary: HA setup for Storage Container Manager to avoid any single point of failure.
+---
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+Ozone has two leader nodes (*Ozone Manager* for key space management and *Storage Container Management* for block space management) and storage nodes (Datanode). Data is replicated between Datanodes with the help of RAFT consensus algorithm.
+
+<div class="alert alert-warning" role="alert">
+Please note that SCM-HA is not ready for production in secure environments. Security work is in progress and will be finished soon.
+</div>
+
+To avoid any single point of failure the leader nodes also should have a HA setup.
+
+Both Ozone Manager and Storage Container Manager supports HA. In this mode the internal state is replicated via RAFT (with Apache Ratis) 
+
+This document explains the HA setup of Storage Container Manager (SCM), please check [this page]({{< ref "OM-HA" >}}) for HA setup of Ozone Manager (OM). While they can be setup for HA independently, a reliable, full HA setup requires enabling HA for both services. 
+
+## Configuration
+
+HA mode of Storage Container Manager can be enabled with the following settings in `ozone-site.xml`:
+
+```XML
+<property>
+   <name>ozone.scm.ratis.enable</name>
+   <value>true</value>
+</property>
+```
+One Ozone configuration (`ozone-site.xml`) can support multiple SCM HA node set, multiple Ozone clusters. To select between the available SCM nodes a logical name is required for each of the clusters which can be resolved to the IP addresses (and domain names) of the Storage Container Managers.
+
+This logical name is called `serviceId` and can be configured in the `ozone-site.xml`
+
+Most of the time you need to set only the values of your current cluster:
+
+ ```XML
+<property>
+   <name>ozone.scm.service.ids</name>
+   <value>cluster1</value>
+</property>
+```
+
+For each of the defined `serviceId` a logical configuration name should be defined for each of the servers
+
+```XML
+<property>
+   <name>ozone.scm.nodes.cluster1</name>
+   <value>scm1,scm2,scm3</value>
+</property>
+```
+
+The defined prefixes can be used to define the address of each of the SCM services:
+
+```XML
+<property>
+   <name>ozone.scm.address.cluster1.scm1</name>
+   <value>host1</value>
+</property>
+<property>
+   <name>ozone.scm.address.cluster1.scm1</name>
+   <value>host2</value>
+</property>
+<property>
+   <name>ozone.scm.address.cluster1.scm1</name>
+   <value>host3</value>
+</property>
+```

Review comment:
       As I recall, there is a need to add a primary node id in config? cc @GlenGeng to confirm

##########
File path: hadoop-hdds/docs/content/feature/SCM-HA.md
##########
@@ -0,0 +1,162 @@
+---
+title: "SCM High Availability"
+weight: 1
+menu:
+   main:
+      parent: Features
+summary: HA setup for Storage Container Manager to avoid any single point of failure.
+---
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+Ozone has two leader nodes (*Ozone Manager* for key space management and *Storage Container Management* for block space management) and storage nodes (Datanode). Data is replicated between Datanodes with the help of RAFT consensus algorithm.
+
+<div class="alert alert-warning" role="alert">
+Please note that SCM-HA is not ready for production in secure environments. Security work is in progress and will be finished soon.
+</div>
+
+To avoid any single point of failure the leader nodes also should have a HA setup.
+
+Both Ozone Manager and Storage Container Manager supports HA. In this mode the internal state is replicated via RAFT (with Apache Ratis) 
+
+This document explains the HA setup of Storage Container Manager (SCM), please check [this page]({{< ref "OM-HA" >}}) for HA setup of Ozone Manager (OM). While they can be setup for HA independently, a reliable, full HA setup requires enabling HA for both services. 
+
+## Configuration
+
+HA mode of Storage Container Manager can be enabled with the following settings in `ozone-site.xml`:
+
+```XML
+<property>
+   <name>ozone.scm.ratis.enable</name>
+   <value>true</value>
+</property>
+```
+One Ozone configuration (`ozone-site.xml`) can support multiple SCM HA node set, multiple Ozone clusters. To select between the available SCM nodes a logical name is required for each of the clusters which can be resolved to the IP addresses (and domain names) of the Storage Container Managers.
+
+This logical name is called `serviceId` and can be configured in the `ozone-site.xml`
+
+Most of the time you need to set only the values of your current cluster:
+
+ ```XML
+<property>
+   <name>ozone.scm.service.ids</name>
+   <value>cluster1</value>
+</property>
+```
+
+For each of the defined `serviceId` a logical configuration name should be defined for each of the servers
+
+```XML
+<property>
+   <name>ozone.scm.nodes.cluster1</name>
+   <value>scm1,scm2,scm3</value>
+</property>
+```
+
+The defined prefixes can be used to define the address of each of the SCM services:
+
+```XML
+<property>
+   <name>ozone.scm.address.cluster1.scm1</name>
+   <value>host1</value>
+</property>
+<property>
+   <name>ozone.scm.address.cluster1.scm1</name>
+   <value>host2</value>
+</property>
+<property>
+   <name>ozone.scm.address.cluster1.scm1</name>
+   <value>host3</value>
+</property>
+```
+
+For reliable HA support choose 3 independent nodes to form a quorum. 
+
+## Bootstrap
+
+The initialization of the **first** SCM-HA node is the same as a none-HA SCM:
+
+```
+bin/ozone scm --init
+```
+
+Second and third nodes should be *bootstrapped* instead of init. These clusters will join to the configured RAFT quorum. The id of the current server is identified by DNS name or can be set explicitly by `ozone.scm.node.id`. Most of the time you don't need to set it as DNS based id detection can work well.
+
+```
+bin/ozone scm --bootstrap
+```
+
+## Auto-bootstrap
+
+In some environment -- such as containerized / K8s environment -- we need to have a common, unified way to initialize SCM HA quorum. As a remained, the standard initialization flow is the following:
+
+ 1. On the first, "primordial" node, call `scm --init`
+ 2. On second/third nodes call `scm --bootstrap`
+
+This can be changed with using `ozone.scm.primordial.node.id`. You can define the primordial node. After setting this node, you should execute **both** `scm --init` and `scm --bootstrap` on **all** nodes.
+
+Based on the `ozone.scm.primordial.node.id`, the init process will be ignored on the second/third nodes and bootstrap process will be ignored on all nodes except the primordial one.
+
+## Implementation details
+
+SCM HA uses Apache Ratis to replicate state between the members of the SCM HA quorum. Each node maintains the block management metadata in local RocksDB.
+
+This replication process is a simpler version of OM HA replication process as it doesn't use any double buffer (as the overall db thourghput of SCM requests )
+
+Datanodes are sending all the reports (Container reports, Pipeline reports...) to *all* the Datanodes parallel. Only the leader node can assign/create new containers, and only the leader node sends command back to the Datanodes.
+
+## Verify SCM HA setup
+
+After starting an SCM-HA it can be validated if the SCM nodes are forming one single quorum instead of 3 individual SCM nodes.
+
+First, check if all the SCM nodes store the same ClusterId metadata:
+
+```bash
+cat /data/metadata/scm/current/VERSION
+```
+
+ClusterId is included in the VERSION file and should be the same in all the SCM nodes:
+
+```bash
+#Tue Mar 16 10:19:33 UTC 2021
+cTime=1615889973116
+clusterID=CID-130fb246-1717-4313-9b62-9ddfe1bcb2e7
+nodeType=SCM
+scmUuid=e6877ce5-56cd-4f0b-ad60-4c8ef9000882
+layoutVersion=0
+```
+
+You can also create data and double check with `ozone debug` tool if all the container metadata is replicated.
+
+```shell
+bin/ozone freon randomkeys --numOfVolumes=1 --numOfBuckets=1 --numOfKeys=10000 --keySize=524288 --replicationType=RATIS --numOfThreads=8 --factor=THREE --bufferSize=1048576
+ 
+ 
+// use debug ldb to check scm db on all the machines
+bin/ozone debug ldb --db=/tmp/metadata/scm.db/ ls
+ 
+ 
+bin/ozone debug ldb --db=/tmp/metadata/scm.db/ scan --with-keys --column_family=containers
+```
+
+## Migrating from existing SCM
+
+SCM HA can be turned on on any Ozone cluster. First enable Ratis (`ozone.scm.ratis.enable`) and configure only one node for the Ratis ring (`ozone.scm.nodes.NAME` should have one element).
+
+Start the cluster and test if it works well.
+
+If everything is fine, you can extend the cluster configuration with multiple nodes, restart SCM node, and initialize the additional nodes with `scm --bootstrap` command.

Review comment:
       Same, I can take the work to add a Chinese version afterwards.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] amaliujia commented on a change in pull request #2050: HDDS-4948. SCM-HA documentation

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #2050:
URL: https://github.com/apache/ozone/pull/2050#discussion_r595703075



##########
File path: hadoop-hdds/docs/content/feature/SCM-HA.md
##########
@@ -0,0 +1,162 @@
+---
+title: "SCM High Availability"
+weight: 1
+menu:
+   main:
+      parent: Features
+summary: HA setup for Storage Container Manager to avoid any single point of failure.
+---
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+Ozone has two leader nodes (*Ozone Manager* for key space management and *Storage Container Management* for block space management) and storage nodes (Datanode). Data is replicated between Datanodes with the help of RAFT consensus algorithm.
+
+<div class="alert alert-warning" role="alert">
+Please note that SCM-HA is not ready for production in secure environments. Security work is in progress and will be finished soon.
+</div>
+
+To avoid any single point of failure the leader nodes also should have a HA setup.
+
+Both Ozone Manager and Storage Container Manager supports HA. In this mode the internal state is replicated via RAFT (with Apache Ratis) 
+
+This document explains the HA setup of Storage Container Manager (SCM), please check [this page]({{< ref "OM-HA" >}}) for HA setup of Ozone Manager (OM). While they can be setup for HA independently, a reliable, full HA setup requires enabling HA for both services. 
+
+## Configuration
+
+HA mode of Storage Container Manager can be enabled with the following settings in `ozone-site.xml`:
+
+```XML
+<property>
+   <name>ozone.scm.ratis.enable</name>
+   <value>true</value>
+</property>
+```
+One Ozone configuration (`ozone-site.xml`) can support multiple SCM HA node set, multiple Ozone clusters. To select between the available SCM nodes a logical name is required for each of the clusters which can be resolved to the IP addresses (and domain names) of the Storage Container Managers.
+
+This logical name is called `serviceId` and can be configured in the `ozone-site.xml`
+
+Most of the time you need to set only the values of your current cluster:
+
+ ```XML
+<property>
+   <name>ozone.scm.service.ids</name>
+   <value>cluster1</value>
+</property>
+```
+
+For each of the defined `serviceId` a logical configuration name should be defined for each of the servers
+
+```XML
+<property>
+   <name>ozone.scm.nodes.cluster1</name>
+   <value>scm1,scm2,scm3</value>
+</property>
+```
+
+The defined prefixes can be used to define the address of each of the SCM services:
+
+```XML
+<property>

Review comment:
       I think there should be a node id entry to specify which SCM this nodes is? E.g. scm1, or scm2, or scm3?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant merged pull request #2050: HDDS-4948. SCM-HA documentation

Posted by GitBox <gi...@apache.org>.
bshashikant merged pull request #2050:
URL: https://github.com/apache/ozone/pull/2050


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on a change in pull request #2050: HDDS-4948. SCM-HA documentation

Posted by GitBox <gi...@apache.org>.
bshashikant commented on a change in pull request #2050:
URL: https://github.com/apache/ozone/pull/2050#discussion_r595832539



##########
File path: hadoop-hdds/docs/content/feature/SCM-HA.md
##########
@@ -0,0 +1,162 @@
+---
+title: "SCM High Availability"
+weight: 1
+menu:
+   main:
+      parent: Features
+summary: HA setup for Storage Container Manager to avoid any single point of failure.
+---
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+Ozone has two leader nodes (*Ozone Manager* for key space management and *Storage Container Management* for block space management) and storage nodes (Datanode). Data is replicated between Datanodes with the help of RAFT consensus algorithm.

Review comment:
       I am fine with "metadata-manager nodes"




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on a change in pull request #2050: HDDS-4948. SCM-HA documentation

Posted by GitBox <gi...@apache.org>.
bshashikant commented on a change in pull request #2050:
URL: https://github.com/apache/ozone/pull/2050#discussion_r595756840



##########
File path: hadoop-hdds/docs/content/feature/SCM-HA.md
##########
@@ -0,0 +1,162 @@
+---
+title: "SCM High Availability"
+weight: 1
+menu:
+   main:
+      parent: Features
+summary: HA setup for Storage Container Manager to avoid any single point of failure.
+---
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+Ozone has two leader nodes (*Ozone Manager* for key space management and *Storage Container Management* for block space management) and storage nodes (Datanode). Data is replicated between Datanodes with the help of RAFT consensus algorithm.
+
+<div class="alert alert-warning" role="alert">
+Please note that SCM-HA is not ready for production in secure environments. Security work is in progress and will be finished soon.
+</div>
+
+To avoid any single point of failure the leader nodes also should have a HA setup.
+
+Both Ozone Manager and Storage Container Manager supports HA. In this mode the internal state is replicated via RAFT (with Apache Ratis) 
+
+This document explains the HA setup of Storage Container Manager (SCM), please check [this page]({{< ref "OM-HA" >}}) for HA setup of Ozone Manager (OM). While they can be setup for HA independently, a reliable, full HA setup requires enabling HA for both services. 
+
+## Configuration
+
+HA mode of Storage Container Manager can be enabled with the following settings in `ozone-site.xml`:
+
+```XML
+<property>
+   <name>ozone.scm.ratis.enable</name>
+   <value>true</value>
+</property>
+```
+One Ozone configuration (`ozone-site.xml`) can support multiple SCM HA node set, multiple Ozone clusters. To select between the available SCM nodes a logical name is required for each of the clusters which can be resolved to the IP addresses (and domain names) of the Storage Container Managers.
+
+This logical name is called `serviceId` and can be configured in the `ozone-site.xml`
+
+Most of the time you need to set only the values of your current cluster:
+
+ ```XML
+<property>
+   <name>ozone.scm.service.ids</name>
+   <value>cluster1</value>
+</property>
+```
+
+For each of the defined `serviceId` a logical configuration name should be defined for each of the servers
+
+```XML
+<property>
+   <name>ozone.scm.nodes.cluster1</name>
+   <value>scm1,scm2,scm3</value>
+</property>
+```
+
+The defined prefixes can be used to define the address of each of the SCM services:
+
+```XML
+<property>

Review comment:
       @amaliujia , this is not a mandatory config. If the local node id is not set, it sets the local node id from the scm address.

##########
File path: hadoop-hdds/docs/content/feature/SCM-HA.md
##########
@@ -0,0 +1,162 @@
+---
+title: "SCM High Availability"
+weight: 1
+menu:
+   main:
+      parent: Features
+summary: HA setup for Storage Container Manager to avoid any single point of failure.
+---
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+Ozone has two leader nodes (*Ozone Manager* for key space management and *Storage Container Management* for block space management) and storage nodes (Datanode). Data is replicated between Datanodes with the help of RAFT consensus algorithm.

Review comment:
       "Ozone has two leader nodes" seems confusing to me. Can we call them master nodes(or master services instead)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on pull request #2050: HDDS-4948. SCM-HA documentation

Posted by GitBox <gi...@apache.org>.
bshashikant commented on pull request #2050:
URL: https://github.com/apache/ozone/pull/2050#issuecomment-801761399


   Thanks @elek , @amaliujia and @adoroszlai .


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] elek commented on a change in pull request #2050: HDDS-4948. SCM-HA documentation

Posted by GitBox <gi...@apache.org>.
elek commented on a change in pull request #2050:
URL: https://github.com/apache/ozone/pull/2050#discussion_r595830007



##########
File path: hadoop-hdds/docs/content/feature/SCM-HA.md
##########
@@ -0,0 +1,162 @@
+---
+title: "SCM High Availability"
+weight: 1
+menu:
+   main:
+      parent: Features
+summary: HA setup for Storage Container Manager to avoid any single point of failure.
+---
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+Ozone has two leader nodes (*Ozone Manager* for key space management and *Storage Container Management* for block space management) and storage nodes (Datanode). Data is replicated between Datanodes with the help of RAFT consensus algorithm.

Review comment:
       *master* word is controversial due to the `master/slave` meaning. I am not a native English and I am not living in US, my primary association is the [first meaning](https://www.dictionary.com/browse/master) not the second one.
   
   > a person with the ability or power to use, control, or dispose of something
   
   But I tried to avoid any confusions. (I have no idea what are the associations of this word in other regions...)
   
   What about "Ozone has two metadata-manager nodes"?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] elek commented on a change in pull request #2050: HDDS-4948. SCM-HA documentation

Posted by GitBox <gi...@apache.org>.
elek commented on a change in pull request #2050:
URL: https://github.com/apache/ozone/pull/2050#discussion_r595836317



##########
File path: hadoop-hdds/docs/content/feature/SCM-HA.md
##########
@@ -0,0 +1,162 @@
+---
+title: "SCM High Availability"
+weight: 1
+menu:
+   main:
+      parent: Features
+summary: HA setup for Storage Container Manager to avoid any single point of failure.
+---
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+Ozone has two leader nodes (*Ozone Manager* for key space management and *Storage Container Management* for block space management) and storage nodes (Datanode). Data is replicated between Datanodes with the help of RAFT consensus algorithm.

Review comment:
       Yeah, leader is also misleading due to the meaning of RAFT leader... I got it. Fixed in 67c545c5d




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org