You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/03/17 04:05:45 UTC

[GitHub] [ozone] amaliujia commented on a change in pull request #2050: HDDS-4948. SCM-HA documentation

amaliujia commented on a change in pull request #2050:
URL: https://github.com/apache/ozone/pull/2050#discussion_r595696488



##########
File path: hadoop-hdds/docs/content/feature/OM-HA.md
##########
@@ -27,15 +27,15 @@ Ozone has two leader nodes (*Ozone Manager* for key space management and *Storag
 
 To avoid any single point of failure the leader nodes also should have a HA setup.
 
- 1. HA of Ozone Manager is implemented with the help of RAFT (Apache Ratis)
- 2. HA of Storage Container Manager is [under implementation]({{< ref "scmha.md">}})
+Both Ozone Manager and Storage Container Manager supports HA. In this mode the internal state is replicated via RAFT (with Apache Ratis) 
+
+This document explain the HA setup of Ozone Manager (OM) HA, please check [this page[({{< ref "SCM-HA" >}})].  While they can be setup for HA independently, a reliable, full HA setup requires enabling HA for both services.

Review comment:
       Out of curiosity: 
   
   Is `[this page[({{< ref "SCM-HA" >}})]` markdown style? Is there a way to verify this is a valid link to SCM-HA doc?

##########
File path: hadoop-hdds/docs/content/feature/OM-HA.md
##########
@@ -27,15 +27,15 @@ Ozone has two leader nodes (*Ozone Manager* for key space management and *Storag
 
 To avoid any single point of failure the leader nodes also should have a HA setup.
 
- 1. HA of Ozone Manager is implemented with the help of RAFT (Apache Ratis)
- 2. HA of Storage Container Manager is [under implementation]({{< ref "scmha.md">}})
+Both Ozone Manager and Storage Container Manager supports HA. In this mode the internal state is replicated via RAFT (with Apache Ratis) 
+
+This document explain the HA setup of Ozone Manager (OM) HA, please check [this page[({{< ref "SCM-HA" >}})].  While they can be setup for HA independently, a reliable, full HA setup requires enabling HA for both services.
 
 ## Ozone Manager HA
 
-A single Ozone Manager uses [RocksDB](https://github.com/facebook/rocksdb/) to persiste metadata (volumes, buckets, keys) locally. HA version of Ozone Manager does exactly the same but all the data is replicated with the help of the RAFT consensus algorithm to follower Ozone Manager instances.
+A single Ozone Manager uses [RocksDB](https://github.com/facebook/rocksdb/) to persist metadata (volumes, buckets, keys) locally. HA version of Ozone Manager does exactly the same but all the data is replicated with the help of the RAFT consensus algorithm to follower Ozone Manager instances.

Review comment:
       I can make a change on Chinese version of this doc after this PR is merged.

##########
File path: hadoop-hdds/docs/content/feature/SCM-HA.md
##########
@@ -0,0 +1,162 @@
+---
+title: "SCM High Availability"
+weight: 1
+menu:
+   main:
+      parent: Features
+summary: HA setup for Storage Container Manager to avoid any single point of failure.
+---
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+Ozone has two leader nodes (*Ozone Manager* for key space management and *Storage Container Management* for block space management) and storage nodes (Datanode). Data is replicated between Datanodes with the help of RAFT consensus algorithm.
+
+<div class="alert alert-warning" role="alert">
+Please note that SCM-HA is not ready for production in secure environments. Security work is in progress and will be finished soon.
+</div>
+
+To avoid any single point of failure the leader nodes also should have a HA setup.
+
+Both Ozone Manager and Storage Container Manager supports HA. In this mode the internal state is replicated via RAFT (with Apache Ratis) 
+
+This document explains the HA setup of Storage Container Manager (SCM), please check [this page]({{< ref "OM-HA" >}}) for HA setup of Ozone Manager (OM). While they can be setup for HA independently, a reliable, full HA setup requires enabling HA for both services. 
+
+## Configuration
+
+HA mode of Storage Container Manager can be enabled with the following settings in `ozone-site.xml`:
+
+```XML
+<property>
+   <name>ozone.scm.ratis.enable</name>
+   <value>true</value>
+</property>
+```
+One Ozone configuration (`ozone-site.xml`) can support multiple SCM HA node set, multiple Ozone clusters. To select between the available SCM nodes a logical name is required for each of the clusters which can be resolved to the IP addresses (and domain names) of the Storage Container Managers.
+
+This logical name is called `serviceId` and can be configured in the `ozone-site.xml`
+
+Most of the time you need to set only the values of your current cluster:
+
+ ```XML
+<property>
+   <name>ozone.scm.service.ids</name>
+   <value>cluster1</value>
+</property>
+```
+
+For each of the defined `serviceId` a logical configuration name should be defined for each of the servers
+
+```XML
+<property>
+   <name>ozone.scm.nodes.cluster1</name>
+   <value>scm1,scm2,scm3</value>
+</property>
+```
+
+The defined prefixes can be used to define the address of each of the SCM services:
+
+```XML
+<property>
+   <name>ozone.scm.address.cluster1.scm1</name>
+   <value>host1</value>
+</property>
+<property>
+   <name>ozone.scm.address.cluster1.scm1</name>
+   <value>host2</value>
+</property>
+<property>
+   <name>ozone.scm.address.cluster1.scm1</name>
+   <value>host3</value>
+</property>
+```

Review comment:
       As I recall, there is a need to add a primary node id in config? cc @GlenGeng to confirm

##########
File path: hadoop-hdds/docs/content/feature/SCM-HA.md
##########
@@ -0,0 +1,162 @@
+---
+title: "SCM High Availability"
+weight: 1
+menu:
+   main:
+      parent: Features
+summary: HA setup for Storage Container Manager to avoid any single point of failure.
+---
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+Ozone has two leader nodes (*Ozone Manager* for key space management and *Storage Container Management* for block space management) and storage nodes (Datanode). Data is replicated between Datanodes with the help of RAFT consensus algorithm.
+
+<div class="alert alert-warning" role="alert">
+Please note that SCM-HA is not ready for production in secure environments. Security work is in progress and will be finished soon.
+</div>
+
+To avoid any single point of failure the leader nodes also should have a HA setup.
+
+Both Ozone Manager and Storage Container Manager supports HA. In this mode the internal state is replicated via RAFT (with Apache Ratis) 
+
+This document explains the HA setup of Storage Container Manager (SCM), please check [this page]({{< ref "OM-HA" >}}) for HA setup of Ozone Manager (OM). While they can be setup for HA independently, a reliable, full HA setup requires enabling HA for both services. 
+
+## Configuration
+
+HA mode of Storage Container Manager can be enabled with the following settings in `ozone-site.xml`:
+
+```XML
+<property>
+   <name>ozone.scm.ratis.enable</name>
+   <value>true</value>
+</property>
+```
+One Ozone configuration (`ozone-site.xml`) can support multiple SCM HA node set, multiple Ozone clusters. To select between the available SCM nodes a logical name is required for each of the clusters which can be resolved to the IP addresses (and domain names) of the Storage Container Managers.
+
+This logical name is called `serviceId` and can be configured in the `ozone-site.xml`
+
+Most of the time you need to set only the values of your current cluster:
+
+ ```XML
+<property>
+   <name>ozone.scm.service.ids</name>
+   <value>cluster1</value>
+</property>
+```
+
+For each of the defined `serviceId` a logical configuration name should be defined for each of the servers
+
+```XML
+<property>
+   <name>ozone.scm.nodes.cluster1</name>
+   <value>scm1,scm2,scm3</value>
+</property>
+```
+
+The defined prefixes can be used to define the address of each of the SCM services:
+
+```XML
+<property>
+   <name>ozone.scm.address.cluster1.scm1</name>
+   <value>host1</value>
+</property>
+<property>
+   <name>ozone.scm.address.cluster1.scm1</name>
+   <value>host2</value>
+</property>
+<property>
+   <name>ozone.scm.address.cluster1.scm1</name>
+   <value>host3</value>
+</property>
+```
+
+For reliable HA support choose 3 independent nodes to form a quorum. 
+
+## Bootstrap
+
+The initialization of the **first** SCM-HA node is the same as a none-HA SCM:
+
+```
+bin/ozone scm --init
+```
+
+Second and third nodes should be *bootstrapped* instead of init. These clusters will join to the configured RAFT quorum. The id of the current server is identified by DNS name or can be set explicitly by `ozone.scm.node.id`. Most of the time you don't need to set it as DNS based id detection can work well.
+
+```
+bin/ozone scm --bootstrap
+```
+
+## Auto-bootstrap
+
+In some environment -- such as containerized / K8s environment -- we need to have a common, unified way to initialize SCM HA quorum. As a remained, the standard initialization flow is the following:
+
+ 1. On the first, "primordial" node, call `scm --init`
+ 2. On second/third nodes call `scm --bootstrap`
+
+This can be changed with using `ozone.scm.primordial.node.id`. You can define the primordial node. After setting this node, you should execute **both** `scm --init` and `scm --bootstrap` on **all** nodes.
+
+Based on the `ozone.scm.primordial.node.id`, the init process will be ignored on the second/third nodes and bootstrap process will be ignored on all nodes except the primordial one.
+
+## Implementation details
+
+SCM HA uses Apache Ratis to replicate state between the members of the SCM HA quorum. Each node maintains the block management metadata in local RocksDB.
+
+This replication process is a simpler version of OM HA replication process as it doesn't use any double buffer (as the overall db thourghput of SCM requests )
+
+Datanodes are sending all the reports (Container reports, Pipeline reports...) to *all* the Datanodes parallel. Only the leader node can assign/create new containers, and only the leader node sends command back to the Datanodes.
+
+## Verify SCM HA setup
+
+After starting an SCM-HA it can be validated if the SCM nodes are forming one single quorum instead of 3 individual SCM nodes.
+
+First, check if all the SCM nodes store the same ClusterId metadata:
+
+```bash
+cat /data/metadata/scm/current/VERSION
+```
+
+ClusterId is included in the VERSION file and should be the same in all the SCM nodes:
+
+```bash
+#Tue Mar 16 10:19:33 UTC 2021
+cTime=1615889973116
+clusterID=CID-130fb246-1717-4313-9b62-9ddfe1bcb2e7
+nodeType=SCM
+scmUuid=e6877ce5-56cd-4f0b-ad60-4c8ef9000882
+layoutVersion=0
+```
+
+You can also create data and double check with `ozone debug` tool if all the container metadata is replicated.
+
+```shell
+bin/ozone freon randomkeys --numOfVolumes=1 --numOfBuckets=1 --numOfKeys=10000 --keySize=524288 --replicationType=RATIS --numOfThreads=8 --factor=THREE --bufferSize=1048576
+ 
+ 
+// use debug ldb to check scm db on all the machines
+bin/ozone debug ldb --db=/tmp/metadata/scm.db/ ls
+ 
+ 
+bin/ozone debug ldb --db=/tmp/metadata/scm.db/ scan --with-keys --column_family=containers
+```
+
+## Migrating from existing SCM
+
+SCM HA can be turned on on any Ozone cluster. First enable Ratis (`ozone.scm.ratis.enable`) and configure only one node for the Ratis ring (`ozone.scm.nodes.NAME` should have one element).
+
+Start the cluster and test if it works well.
+
+If everything is fine, you can extend the cluster configuration with multiple nodes, restart SCM node, and initialize the additional nodes with `scm --bootstrap` command.

Review comment:
       Same, I can take the work to add a Chinese version afterwards.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org